Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.
It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.
Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.
There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).
This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.
This flows a lot better, which helps with later items that deal with
freezing/wraparound.
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.
It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:
"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"
Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.
Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).
5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".
It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:
"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"
This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.
6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".
This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
Thoughts?
--
Peter Geoghegan
Attachments:
v1-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v1-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload
From bdefa0e5bf8616e9d9afa18ef44e38bd5f77a84e Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:33:42 -0700
Subject: [PATCH v1 9/9] Overhaul "Recovering Disk Space" vacuuming docs.
Say a lot more about the possible impact of long-running transactions on
VACUUM. Remove all talk of administrators getting by without
autovacuum; at most administrators might want to schedule manual VACUUM
operations to supplement autovacuum (this documentation was written at a
time when the visibility map didn't exist, even in its most basic form).
Also describe VACUUM FULL as an entirely different kind of operation to
conventional lazy vacuum.
---
doc/src/sgml/maintenance.sgml | 173 ++++++++++++++++++----------------
1 file changed, 93 insertions(+), 80 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 169c6f41a..bb21474e1 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -342,100 +342,113 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
This approach is necessary to gain the benefits of multiversion
concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version
must not be deleted while it is still potentially visible to other
- transactions. But eventually, an outdated or deleted row version is no
- longer of interest to any transaction. The space it occupies must then be
- reclaimed for reuse by new rows, to avoid unbounded growth of disk
- space requirements. This is done by running <command>VACUUM</command>.
+ transactions. A deleted row version (whether from an
+ <command>UPDATE</command> or <command>DELETE</command>) will
+ usually cease to be of interest to any still running transaction
+ shortly after the original deleting transaction commits.
</para>
<para>
- The standard form of <command>VACUUM</command> removes dead row
- versions in tables and indexes and marks the space available for
- future reuse. However, it will not return the space to the operating
- system, except in the special case where one or more pages at the
- end of a table become entirely free and an exclusive table lock can be
- easily obtained. In contrast, <command>VACUUM FULL</command> actively compacts
- tables by writing a complete new version of the table file with no dead
- space. This minimizes the size of the table, but can take a long time.
- It also requires extra disk space for the new copy of the table, until
- the operation completes.
+ The space dead tuples occupy must eventually be reclaimed for
+ reuse by new rows, to avoid unbounded growth of disk space
+ requirements. Reclaiming space from dead rows is
+ <command>VACUUM</command>'s main responsibility.
</para>
<para>
- The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s
- often enough to avoid needing <command>VACUUM FULL</command>. The
- autovacuum daemon attempts to work this way, and in fact will
- never issue <command>VACUUM FULL</command>. In this approach, the idea
- is not to keep tables at their minimum size, but to maintain steady-state
- usage of disk space: each table occupies space equivalent to its
- minimum size plus however much space gets used up between vacuum runs.
- Although <command>VACUUM FULL</command> can be used to shrink a table back
- to its minimum size and return the disk space to the operating system,
- there is not much point in this if the table will just grow again in the
- future. Thus, moderately-frequent standard <command>VACUUM</command> runs are a
- better approach than infrequent <command>VACUUM FULL</command> runs for
- maintaining heavily-updated tables.
- </para>
-
- <para>
- Some administrators prefer to schedule vacuuming themselves, for example
- doing all the work at night when load is low.
- The difficulty with doing vacuuming according to a fixed schedule
- is that if a table has an unexpected spike in update activity, it may
- get bloated to the point that <command>VACUUM FULL</command> is really necessary
- to reclaim space. Using the autovacuum daemon alleviates this problem,
- since the daemon schedules vacuuming dynamically in response to update
- activity. It is unwise to disable the daemon completely unless you
- have an extremely predictable workload. One possible compromise is
- to set the daemon's parameters so that it will only react to unusually
- heavy update activity, thus keeping things from getting out of hand,
- while scheduled <command>VACUUM</command>s are expected to do the bulk of the
- work when the load is typical.
- </para>
-
- <para>
- For those not using autovacuum, a typical approach is to schedule a
- database-wide <command>VACUUM</command> once a day during a low-usage period,
- supplemented by more frequent vacuuming of heavily-updated tables as
- necessary. (Some installations with extremely high update rates vacuum
- their busiest tables as often as once every few minutes.) If you have
- multiple databases in a cluster, don't forget to
- <command>VACUUM</command> each one; the program <xref
- linkend="app-vacuumdb"/> might be helpful.
+ The XID cutoff point that <command>VACUUM</command> uses to
+ determine whether or not a deleted tuple is safe to physically
+ remove is reported under <literal>removable cutoff</literal> in
+ the server log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum.
+ Tuples that are not yet safe to remove are counted as
+ <literal>dead but not yet removable</literal> tuples in the log
+ report. <command>VACUUM</command> establishes its
+ <literal>removable cutoff</literal> once, at the start of the
+ operation. Any older snapshot (or transaction that allocates an
+ XID) that's still running when the cutoff is established may hold
+ it back.
</para>
<tip>
- <para>
- Plain <command>VACUUM</command> may not be satisfactory when
- a table contains large numbers of dead row versions as a result of
- massive update or delete activity. If you have such a table and
- you need to reclaim the excess disk space it occupies, you will need
- to use <command>VACUUM FULL</command>, or alternatively
- <link linkend="sql-cluster"><command>CLUSTER</command></link>
- or one of the table-rewriting variants of
- <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
- These commands rewrite an entire new copy of the table and build
- new indexes for it. All these options require an
- <literal>ACCESS EXCLUSIVE</literal> lock. Note that
- they also temporarily use extra disk space approximately equal to the size
- of the table, since the old copies of the table and indexes can't be
- released until the new ones are complete.
- </para>
+ <para>
+ It's important that no long running transactions ever be allowed
+ to hold back every <command>VACUUM</command> operation's cutoff
+ for an extended period. You may wish to monitor this.
+ </para>
</tip>
- <tip>
+ <note>
+ <para>
+ Tuples inserted by aborted transactions can be removed by
+ <command>VACUUM</command> immediately
+ </para>
+ </note>
+
<para>
- If you have a table whose entire contents are deleted on a periodic
- basis, consider doing it with
- <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather
- than using <command>DELETE</command> followed by
- <command>VACUUM</command>. <command>TRUNCATE</command> removes the
- entire content of the table immediately, without requiring a
- subsequent <command>VACUUM</command> or <command>VACUUM
- FULL</command> to reclaim the now-unused disk space.
- The disadvantage is that strict MVCC semantics are violated.
+ <command>VACUUM</command> will not return space to the operating
+ system, except in the special case where a group of contiguous
+ pages at the end of a table become entirely free and an exclusive
+ table lock can be easily obtained. This relation truncation
+ behavior can be disabled in tables where the exclusive lock is
+ disruptive by setting the table's <varname>vacuum_truncate</varname>
+ storage parameter to <literal>off</literal>.
</para>
+
+ <tip>
+ <para>
+ If you have a table whose entire contents are deleted on a
+ periodic basis, consider doing it with <link
+ linkend="sql-truncate"><command>TRUNCATE</command></link> rather
+ than relying on <command>VACUUM</command>.
+ <command>TRUNCATE</command> removes the entire contents of the
+ table immediately, avoiding the need to set
+ <structfield>xmax</structfield> to the deleting transaction's XID.
+ One disadvantage is that strict MVCC semantics are violated.
+ </para>
</tip>
+ <tip>
+ <para>
+ <command>VACUUM FULL</command> or <command>CLUSTER</command> can
+ be useful when dealing with extreme amounts of dead tuples. It
+ can reclaim more disk space, but runs much more slowly. It
+ rewrites an entire new copy of the table and rebuilds all indexes.
+ This typically has much higher overhead than
+ <command>VACUUM</command>. Generally, therefore, administrators
+ should avoid using <command>VACUUM FULL</command> except in the
+ most extreme cases.
+ </para>
+ </tip>
+ <note>
+ <para>
+ Although <command>VACUUM FULL</command> is technically an option
+ of the <command>VACUUM</command> command, <command>VACUUM
+ FULL</command> uses a completely different implementation.
+ <command>VACUUM FULL</command> is essentially a variant of
+ <command>CLUSTER</command>. (The name <command>VACUUM
+ FULL</command> is historical; the original implementation was
+ somewhat closer to standard <command>VACUUM</command>.)
+ </para>
+ </note>
+ <warning>
+ <para>
+ <command>TRUNCATE</command>, <command>VACUUM FULL</command>, and
+ <command>CLUSTER</command> all require an <literal>ACCESS
+ EXCLUSIVE</literal> lock, which can be highly disruptive
+ (<command>SELECT</command>, <command>INSERT</command>,
+ <command>UPDATE</command>, and <command>DELETE</command> commands
+ will all be blocked).
+ </para>
+ </warning>
+ <warning>
+ <para>
+ <command>VACUUM FULL</command> (and <command>CLUSTER</command>)
+ temporarily uses extra disk space approximately equal to the size
+ of the table, since the old copies of the table and indexes can't
+ be released until the new ones are complete.
+ </para>
+ </warning>
</sect2>
<sect2 id="freezing-xid-space">
--
2.40.0
v1-0008-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v1-0008-Overhaul-freezing-and-wraparound-docs.patchDownload
From b1af9724b6553a17ba42bab4df5f595428bcdbf8 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 13:04:13 -0700
Subject: [PATCH v1 8/9] Overhaul freezing and wraparound docs.
This is almost a complete rewrite. "Preventing Transaction ID
Wraparound Failures" becomes "Freezing to manage the transaction ID
space". This is follow-up work to commit 1de58df4, which added
page-level freezing to VACUUM.
The emphasis is now on the physical work of freezing pages. This flows
a little better than it otherwise would due to recent structural
cleanups to maintenance.sgml; discussion about freezing now immediately
follows discussion of cleanup of dead tuples. We still talk about the
problem of the system activating xidStopLimit protections in the same
section, but we use much less alarmist language about data corruption,
and are no longer overly concerned about the very worst case. We don't
rescind the recommendation that users recover from an xidStopLimit
outage by using single user mode, though that seems like something we
should aim to do in the near future.
There is no longer a separate sect3 to discuss MultiXactId related
issues. VACUUM now performs exactly the same processing steps when it
freezes a page, independent of the trigger condition.
Also describe the page-level freezing FPI optimization added by commit
1de58df4. This is expected to trigger the majority of all freezing with
many types of workloads.
---
doc/src/sgml/config.sgml | 24 +-
doc/src/sgml/logicaldecoding.sgml | 2 +-
doc/src/sgml/maintenance.sgml | 602 ++++++++++++++--------
doc/src/sgml/ref/create_table.sgml | 2 +-
doc/src/sgml/ref/prepare_transaction.sgml | 2 +-
doc/src/sgml/ref/vacuum.sgml | 6 +-
doc/src/sgml/ref/vacuumdb.sgml | 4 +-
doc/src/sgml/xact.sgml | 4 +-
8 files changed, 399 insertions(+), 247 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 091a79d4f..4c1cc0e8e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8394,7 +8394,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Note that even when this parameter is disabled, the system
will launch autovacuum processes if necessary to
prevent transaction ID wraparound. See <xref
- linkend="vacuum-for-wraparound"/> for more information.
+ linkend="freezing-xid-space"/> for more information.
</para>
</listitem>
</varlistentry>
@@ -8583,7 +8583,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
This parameter can only be set at server start, but the setting
can be reduced for individual tables by
changing table storage parameters.
- For more information see <xref linkend="vacuum-for-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -8612,7 +8612,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
400 million multixacts.
This parameter can only be set at server start, but the setting can
be reduced for individual tables by changing table storage parameters.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="aggressive-strategy"/>.
</para>
</listitem>
</varlistentry>
@@ -9319,7 +9319,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound autovacuum is launched for the table. For more
information see
- <xref linkend="vacuum-for-wraparound"/>.
+ <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9341,7 +9341,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-freeze-max-age"/>, so
that there is not an unreasonably short time between forced
autovacuums. For more information see <xref
- linkend="vacuum-for-wraparound"/>.
+ linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9357,8 +9357,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Specifies the maximum age (in transactions) that a table's
<structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
field can attain before <command>VACUUM</command> takes
- extraordinary measures to avoid system-wide transaction ID
- wraparound failure. This is <command>VACUUM</command>'s
+ extraordinary measures to avoid
+ <link linkend="xid-stop-limit">having the system refuse to
+ allocate new transaction IDs</link>. This is <command>VACUUM</command>'s
strategy of last resort. The failsafe typically triggers
when an autovacuum to prevent transaction ID wraparound has
already been running for some time, though it's possible for
@@ -9402,7 +9403,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<xref linkend="guc-autovacuum-multixact-freeze-max-age"/>, so that a
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound is launched for the table.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="aggressive-strategy"/>.
</para>
</listitem>
</varlistentry>
@@ -9423,7 +9424,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>,
so that there is not an unreasonably short time between forced
autovacuums.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9439,8 +9440,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Specifies the maximum age (in multixacts) that a table's
<structname>pg_class</structname>.<structfield>relminmxid</structfield>
field can attain before <command>VACUUM</command> takes
- extraordinary measures to avoid system-wide multixact ID
- wraparound failure. This is <command>VACUUM</command>'s
+ extraordinary measures to avoid
+ <link linkend="xid-stop-limit">having the system refuse to
+ allocate new MultiXactIDs</link>. This is <command>VACUUM</command>'s
strategy of last resort. The failsafe typically triggers when
an autovacuum to prevent transaction ID wraparound has already
been running for some time, though it's possible for the
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cbd3aa804..80dade3be 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -353,7 +353,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
because neither required WAL nor required rows from the system catalogs
can be removed by <command>VACUUM</command> as long as they are required by a replication
slot. In extreme cases this could cause the database to shut down to prevent
- transaction ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ transaction ID wraparound (see <xref linkend="freezing-xid-space"/>).
So if a slot is no longer required it should be dropped.
</para>
</caution>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 7476e5922..169c6f41a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -275,15 +275,21 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To maintain the system's ability to allocated new
+ transaction IDs (and new multixact IDs) through freezing.</simpara>
</listitem>
<listitem>
<simpara>To update the visibility map, which speeds
up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
+ scans</link>, and helps the next <command>VACUUM</command>
+ operation avoid needlessly scanning pages that are already
+ frozen</simpara>
+ </listitem>
+
+ <listitem>
+ <simpara>To truncate obsolescent transaction status information,
+ when possible</simpara>
</listitem>
<listitem>
@@ -432,10 +438,10 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-wraparound">
- <title>Preventing Transaction ID Wraparound Failures</title>
+ <sect2 id="freezing-xid-space">
+ <title>Freezing to manage the transaction ID space</title>
- <indexterm zone="vacuum-for-wraparound">
+ <indexterm zone="freezing-xid-space">
<primary>transaction ID</primary>
<secondary>wraparound</secondary>
</indexterm>
@@ -461,274 +467,364 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</para>
<para>
- <xref linkend="guc-vacuum-freeze-min-age"/>
- controls how old an XID value has to be before rows bearing that XID will be
- frozen. Increasing this setting may avoid unnecessary work if the
- rows that would otherwise be frozen will soon be modified again,
- but decreasing this setting increases
- the number of transactions that can elapse before the table must be
- vacuumed again.
+ <command>VACUUM</command> marks pages as
+ <emphasis>frozen</emphasis>, indicating that all eligible rows on
+ the page were inserted by a transaction that committed
+ sufficiently far in the past that the effects of the inserting
+ transaction are certain to be visible to all current and future
+ transactions. Frozen rows are treated as if the inserting
+ transaction (the heap tuple's <structfield>xmin</structfield> XID)
+ committed <quote>infinitely far in the past</quote>, making the
+ effects of the transaction visible to all current and future MVCC
+ snapshots, forever. Once frozen, pages are <quote>self
+ contained</quote> in the sense that all of its rows can be read
+ without ever having to consult externally stored transaction status
+ metadata (for example, transaction commit status information from
+ <filename>pg_xact</filename> is not required).
</para>
<para>
- <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link>
- to determine which pages of a table must be scanned. Normally, it
- will skip pages that don't have any dead row versions even if those pages
- might still have row versions with old XID values. Therefore, normal
- <command>VACUUM</command>s won't always freeze every old row version in the table.
- When that happens, <command>VACUUM</command> will eventually need to perform an
- <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen
- XID and MXID values, including those from all-visible but not all-frozen pages.
- In practice most tables require periodic aggressive vacuuming.
- <xref linkend="guc-vacuum-freeze-table-age"/>
- controls when <command>VACUUM</command> does that: all-visible but not all-frozen
- pages are scanned if the number of transactions that have passed since the
- last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus
- <varname>vacuum_freeze_min_age</varname>. Setting
- <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to
- always use its aggressive strategy.
+ Without freezing, the system will eventually <link
+ linkend="xid-stop-limit">refuse to allocate new transaction
+ IDs</link>, making <acronym>DML</acronym> statements throw errors
+ until such time as <command>VACUUM</command> can restore the
+ system's ability to allocate new transaction IDs. The system is
+ unable to recognize distances of more than about 2.1 billion
+ transactions between any two unfrozen XIDs. The only safe option
+ that remains is to disallow allocating new XIDs until
+ <command>VACUUM</command> has performed the steps required to make
+ sure that the <quote>distance</quote> invariant is never violated.
</para>
<para>
- The maximum time that a table can go unvacuumed is two billion
- transactions minus the <varname>vacuum_freeze_min_age</varname> value at
- the time of the last aggressive vacuum. If it were to go
- unvacuumed for longer than
- that, data loss could result. To ensure that this does not happen,
- autovacuum is invoked on any table that might contain unfrozen rows with
- XIDs older than the age specified by the configuration parameter <xref
- linkend="guc-autovacuum-freeze-max-age"/>. (This will happen even if
- autovacuum is disabled.)
+ <xref linkend="guc-vacuum-freeze-min-age"/> can be used to control
+ when freezing takes place. When <command>VACUUM</command> scans a
+ heap page containing even one XID that already attained an age
+ exceeding this value, the page is frozen. All tuples from the page
+ are frozen, making the page suitable for long-term storage (their
+ is no longer any possible need to interpret the contents of the
+ page using external transaction status information). Increasing
+ this setting may avoid unnecessary work if the pages that would
+ otherwise be frozen will soon be modified again, but decreasing
+ this setting makes it more likely that some future
+ <command>VACUUM</command> operation will need to perform an
+ excessive amount of <quote>catch-up freezing</quote>, all in one go.
+ </para>
+
+ <indexterm>
+ <primary>MultiXactId</primary>
+ </indexterm>
+
+ <indexterm>
+ <primary>freezing</primary>
+ <secondary>of multixact IDs</secondary>
+ </indexterm>
+
+ <para>
+ <firstterm>Multixact IDs</firstterm> are used to support row
+ locking by multiple transactions. Since there is only limited
+ space in a tuple header to store lock information, that
+ information is encoded as a <quote>multiple transaction
+ ID</quote>, or multixact ID for short, whenever there is more
+ than one transaction concurrently locking a row. Information
+ about which transaction IDs are included in any particular
+ multixact ID is stored separately, and only the multixact ID
+ appears in the <structfield>xmax</structfield> field in the tuple
+ header. Like transaction IDs, multixact IDs are implemented as a
+ 32-bit counter and corresponding storage.
+ <quote>Freezing</quote> of multixact IDs is also required, though
+ this actually means setting <structfield>xmax</structfield> to
+ <literal>InvalidTransactionId</literal>.
+ <!-- The following parenthetical statement isn't strictly true,
+ but it is true in spirit. VACUUM practically always processes
+ xmax fields this way in practice. -->
+ (Actually, <emphasis>any</emphasis> <structfield>xmax</structfield>
+ field is processed in the same way when its page is frozen,
+ including <structfield>xmax</structfield> fields that contain
+ simple XIDs.)
</para>
<para>
- This implies that if a table is not otherwise vacuumed,
- autovacuum will be invoked on it approximately once every
- <varname>autovacuum_freeze_max_age</varname> minus
- <varname>vacuum_freeze_min_age</varname> transactions.
- For tables that are regularly vacuumed for space reclamation purposes,
- this is of little importance. However, for static tables
- (including tables that receive inserts, but no updates or deletes),
- there is no need to vacuum for space reclamation, so it can
- be useful to try to maximize the interval between forced autovacuums
- on very large static tables. Obviously one can do this either by
- increasing <varname>autovacuum_freeze_max_age</varname> or decreasing
- <varname>vacuum_freeze_min_age</varname>.
+ <xref linkend="guc-vacuum-multixact-freeze-min-age"/> can also
+ influence which pages <command>VACUUM</command> freezes. Like
+ <varname>vacuum_freeze_min_age</varname>, the setting triggers
+ page-level freezing. However,
+ <varname>vacuum_multixact_freeze_min_age</varname> uses
+ MultiXactIds (not XIDs) to measure <quote>age</quote>.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 16,
+ freezing was triggered at the level of individual
+ <structfield>xmin</structfield> and
+ <structfield>xmax</structfield> fields. The decision to freeze
+ (or not freeze) is now made at the level of whole heap pages, at
+ the point that they are scanned by <command>VACUUM</command>.
+ </para>
+ </note>
+
+ <para>
+ In general, the major cost of freezing is the additional
+ <acronym>WAL</acronym> volume. That's why
+ <command>VACUUM</command> doesn't just freeze every eligible tuple
+ at the earliest opportunity: freezing will go to waste in cases
+ where a recently frozen tuple soon goes on to be deleted anyway.
+ Managing the added <acronym>WAL</acronym> volume from freezing
+ over time is an important consideration for
+ <command>VACUUM</command>.
</para>
<para>
- The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 *
- <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be
- capped to the maximum. A value higher than
- <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an
- anti-wraparound autovacuum would be triggered at that point anyway, and
- the 0.95 multiplier leaves some breathing room to run a manual
- <command>VACUUM</command> before that happens. As a rule of thumb,
- <command>vacuum_freeze_table_age</command> should be set to a value somewhat
- below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that
- a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by
- normal delete and update activity is run in that window. Setting it too
- close could lead to anti-wraparound autovacuums, even though the table
- was recently vacuumed to reclaim space, whereas lower values lead to more
- frequent aggressive vacuuming.
+ <command>VACUUM</command> also triggers freezing of pages in cases
+ where it already proved necessary to write out an
+ <acronym>FPI</acronym> to the <acronym>WAL</acronym> as torn page
+ protection (as part of removing dead tuples). The extra
+ <acronym>WAL</acronym> volume from proactive freezing is
+ insignificant compared to the cost of the <acronym>FPI</acronym>.
+ It is very likely (though not quite certain) that the overall
+ volume of <acronym>WAL</acronym> will be lower in the long term
+ with tables that have most freezing triggered by the
+ <acronym>FPI</acronym> mechanism, since (at least on average)
+ future <command>VACUUM</command>s shouldn't have to write a second
+ <acronym>FPI</acronym> out much later on, when freezing becomes
+ strictly necessary. The <acronym>FPI</acronym> freezing mechanism
+ is just an alternative trigger criteria for freezing all eligible
+ tuples on the page. In general, <command>VACUUM</command> freezes
+ pages without regard to the condition that triggered freezing.
+ The physical modifications to the page (how tuples are processed)
+ is decoupled from the mechanism that decides if those
+ modifications should happen at all.
</para>
<para>
- The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname>
- (and <varname>vacuum_freeze_table_age</varname> along with it) is that
- the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename>
- subdirectories of the database cluster will take more space, because it
- must store the commit status and (if <varname>track_commit_timestamp</varname> is
- enabled) timestamp of all transactions back to
- the <varname>autovacuum_freeze_max_age</varname> horizon. The commit status uses
- two bits per transaction, so if
- <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value
- of two billion, <filename>pg_xact</filename> can be expected to grow to about half
- a gigabyte and <filename>pg_commit_ts</filename> to about 20GB. If this
- is trivial compared to your total database size,
- setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value
- is recommended. Otherwise, set it depending on what you are willing to
- allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage.
- (The default, 200 million transactions, translates to about 50MB
- of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename>
- storage.)
+ <command>VACUUM</command> may not be able to freeze every tuple's
+ <structfield>xmin</structfield> in relatively rare cases. The
+ criteria that determines eligibility for freezing is exactly the
+ same as the one that determines if a deleted tuple should be
+ considered <literal>removable</literal> or merely <literal>dead
+ but not yet removable</literal> (namely, the XID-based
+ <literal>removable cutoff</literal>).
</para>
- <para>
- One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that
- it might cause <command>VACUUM</command> to do useless work: freezing a row
- version is a waste of time if the row is modified
- soon thereafter (causing it to acquire a new XID). So the setting should
- be large enough that rows are not frozen until they are unlikely to change
- any more.
- </para>
+ <sect3 id="aggressive-strategy">
+ <title><command>VACUUM</command>'s aggressive strategy</title>
+ <para>
+ To track the age of the oldest unfrozen XIDs in a database,
+ <command>VACUUM</command> stores XID statistics in the system
+ tables <structname>pg_class</structname> and
+ <structname>pg_database</structname>. In particular, the
+ <structfield>relfrozenxid</structfield> column of a table's
+ <structname>pg_class</structname> row contains the oldest
+ remaining unfrozen XID at the end of the most recent
+ <command>VACUUM</command> that successfully advanced
+ <structfield>relfrozenxid</structfield>. Similarly, the
+ <structfield>datfrozenxid</structfield> column of a database's
+ <structname>pg_database</structname> row is a lower bound on the
+ unfrozen XIDs appearing in that database — it is just the
+ minimum of the per-table <structfield>relfrozenxid</structfield>
+ values within the database.
+ </para>
- <para>
- To track the age of the oldest unfrozen XIDs in a database,
- <command>VACUUM</command> stores XID
- statistics in the system tables <structname>pg_class</structname> and
- <structname>pg_database</structname>. In particular,
- the <structfield>relfrozenxid</structfield> column of a table's
- <structname>pg_class</structname> row contains the oldest remaining unfrozen
- XID at the end of the most recent <command>VACUUM</command> that successfully
- advanced <structfield>relfrozenxid</structfield> (typically the most recent
- aggressive VACUUM). Similarly, the
- <structfield>datfrozenxid</structfield> column of a database's
- <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
- appearing in that database — it is just the minimum of the
- per-table <structfield>relfrozenxid</structfield> values within the database.
- A convenient way to
- examine this information is to execute queries such as:
+ <para>
+ <command>VACUUM</command> uses the <link
+ linkend="storage-vm">visibility map</link> to determine which
+ pages of a table must be scanned. Normally, it will skip pages
+ that don't have any dead row versions even if those pages might
+ still have row versions with old XID values. Therefore, normal
+ <command>VACUUM</command>s usually won't freeze
+ <emphasis>every</emphasis> page with an old row version in the
+ table. Most individual tables will eventually need an
+ <firstterm>aggressive vacuum</firstterm>, which will reliably
+ freeze all pages with XID and MXID values older than
+ <varname>vacuum_freeze_min_age</varname>, including those from
+ all-visible but not all-frozen pages. <xref
+ linkend="guc-vacuum-freeze-table-age"/> controls when
+ <command>VACUUM</command> must use its aggressive strategy.
+ Since the setting is applied against
+ <literal>age(relfrozenxid)</literal>, settings like
+ <varname>vacuum_freeze_min_age</varname> may influence the exact
+ cadence of aggressive vacuuming. Setting
+ <varname>vacuum_freeze_table_age</varname> to 0 forces
+ <command>VACUUM</command> to always use its aggressive strategy.
+ </para>
+
+ <note>
+ <para>
+ In practice most tables require periodic aggressive vacuuming.
+ However, some individual non-aggressive
+ <command>VACUUM</command> operations may be able to advance
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield>. This is most common in
+ small, frequently modified tables, where
+ <command>VACUUM</command> happens to scan all pages (or at least
+ all pages not marked all-frozen) in the course of removing dead
+ tuples.
+ </para>
+ </note>
+
+ <para>
+ The maximum time that a table can go unvacuumed is about two
+ billion transactions. If it were to go unvacuumed for longer than
+ that, the system will <link linkend="xid-stop-limit">refuse to
+ allocate new transaction IDs</link>, temporarily rendering the
+ database read-only. To ensure that this never happens,
+ autovacuum is invoked on any table that might contain unfrozen
+ rows with XIDs older than the age specified by the configuration
+ parameter <xref linkend="guc-autovacuum-freeze-max-age"/>. (This
+ will happen even if autovacuum is disabled.)
+ </para>
+
+ <para>
+ <!-- This isn't strictly true, since anti-wraparound autovacuuming
+ merely implies aggressive mode -->
+ In practice all anti-wraparound autovacuums will use
+ <command>VACUUM</command>'s aggressive strategy. This is assured
+ because the effective value of
+ <varname>vacuum_freeze_table_age</varname> is
+ <quote>clamped</quote> to a value no greater than 95% of the
+ current value of <varname>autovacuum_freeze_max_age</varname>.
+ As a rule of thumb, <command>vacuum_freeze_table_age</command>
+ should be set to a value somewhat below
+ <varname>autovacuum_freeze_max_age</varname>, leaving enough gap
+ so that a regularly scheduled <command>VACUUM</command> or an
+ autovacuum triggered by inserts, updates and deletes is run in
+ that window. Anti-wraparound autovacuums can be avoided
+ altogether in tables that reliably receive
+ <emphasis>some</emphasis> <command>VACUUM</command>s that use the
+ aggressive strategy.
+ </para>
+
+ <note>
+ <para>
+ There is no fundamental difference between a
+ <command>VACUUM</command> run during anti-wraparound
+ autovacuuming and a <command>VACUUM</command> that happens to
+ use the aggressive strategy (whether run by autovacuum or
+ manually issued).
+ </para>
+ </note>
+
+ <para>
+ A convenient way to examine information about
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> is to execute queries such as:
<programlisting>
SELECT c.oid::regclass as table_name,
- greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
+greatest(age(c.relfrozenxid),
+ age(t.relfrozenxid)) as xid_age,
+mxid_age(c.relminmxid)
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');
-SELECT datname, age(datfrozenxid) FROM pg_database;
+SELECT datname,
+age(datfrozenxid) as xid_age,
+mxid_age(datminmxid)
+FROM pg_database;
</programlisting>
- The <literal>age</literal> column measures the number of transactions from the
- cutoff XID to the current transaction's XID.
- </para>
-
- <tip>
- <para>
- When the <command>VACUUM</command> command's <literal>VERBOSE</literal>
- parameter is specified, <command>VACUUM</command> prints various
- statistics about the table. This includes information about how
- <structfield>relfrozenxid</structfield> and
- <structfield>relminmxid</structfield> advanced, and the number of
- newly frozen pages. The same details appear in the server log when
- autovacuum logging (controlled by <xref
- linkend="guc-log-autovacuum-min-duration"/>) reports on a
- <command>VACUUM</command> operation executed by autovacuum.
+ The <literal>age</literal> column measures the number of
+ transactions from the cutoff XID to the next unallocated
+ transactions XID. The <literal>mxid_age</literal> column
+ measures the number of MultiXactIds from the cutoff MultiXactId
+ to the next unallocated MultiXactId.
</para>
- </tip>
- <para>
- <command>VACUUM</command> normally only scans pages that have been modified
- since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
- advanced when every page of the table
- that might contain unfrozen XIDs is scanned. This happens when
- <structfield>relfrozenxid</structfield> is more than
- <varname>vacuum_freeze_table_age</varname> transactions old, when
- <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all
- pages that are not already all-frozen happen to
- require vacuuming to remove dead row versions. When <command>VACUUM</command>
- scans every page in the table that is not already all-frozen, it should
- set <literal>age(relfrozenxid)</literal> to a value just a little more than the
- <varname>vacuum_freeze_min_age</varname> setting
- that was used (more by the number of transactions started since the
- <command>VACUUM</command> started). <command>VACUUM</command>
- will set <structfield>relfrozenxid</structfield> to the oldest XID
- that remains in the table, so it's possible that the final value
- will be much more recent than strictly required.
- If no <structfield>relfrozenxid</structfield>-advancing
- <command>VACUUM</command> is issued on the table until
- <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
- be forced for the table.
- </para>
+ <tip>
+ <para>
+ When the <command>VACUUM</command> command's
+ <literal>VERBOSE</literal> parameter is specified,
+ <command>VACUUM</command> prints various statistics about the
+ table. This includes information about how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advanced, and the number
+ of newly frozen pages. The same details appear in the server
+ log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum.
+ </para>
+ </tip>
+ </sect3>
- <para>
- If for some reason autovacuum fails to clear old XIDs from a table, the
- system will begin to emit warning messages like this when the database's
- oldest XIDs reach forty million transactions from the wraparound point:
+ <sect3 id="xid-stop-limit">
+ <title><literal>xidStopLimit</literal> mode</title>
+ <para>
+ If for some reason autovacuum utterly fails to advance any
+ table's <structfield>relfrozenxid</structfield> or
+ <structfield>relminmxid</structfield>, the system will begin to
+ emit warning messages like this when the database's oldest XIDs
+ reach forty million transactions from the wraparound point:
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
</programlisting>
- (A manual <command>VACUUM</command> should fix the problem, as suggested by the
- hint; but note that the <command>VACUUM</command> must be performed by a
- superuser, else it will fail to process system catalogs and thus not
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will eventually refuse
+ to start any new transactions. This happens at the point that
+ there are fewer than three million transactions left:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
- </para>
-
- <sect3 id="vacuum-for-multixact-wraparound">
- <title>Multixacts and Wraparound</title>
-
- <indexterm>
- <primary>MultiXactId</primary>
- </indexterm>
-
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of multixact IDs</secondary>
- </indexterm>
-
- <para>
- <firstterm>Multixact IDs</firstterm> are used to support row locking by
- multiple transactions. Since there is only limited space in a tuple
- header to store lock information, that information is encoded as
- a <quote>multiple transaction ID</quote>, or multixact ID for short,
- whenever there is more than one transaction concurrently locking a
- row. Information about which transaction IDs are included in any
- particular multixact ID is stored separately in
- the <filename>pg_multixact</filename> subdirectory, and only the multixact ID
- appears in the <structfield>xmax</structfield> field in the tuple header.
- Like transaction IDs, multixact IDs are implemented as a
- 32-bit counter and corresponding storage, all of which requires
- careful aging management, storage cleanup, and wraparound handling.
- There is a separate storage area which holds the list of members in
- each multixact, which also uses a 32-bit counter and which must also
- be managed.
+ The three-million-transaction safety margin exists to let the
+ administrator recover without data loss, by manually executing the
+ required <command>VACUUM</command> commands. However, since the system will not
+ execute commands once it has gone into the safety shutdown mode,
+ the only way to do this is to stop the server and start the server in single-user
+ mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
+ in single-user mode. See the <xref linkend="app-postgres"/> reference
+ page for details about using single-user mode.
</para>
<para>
- Whenever <command>VACUUM</command> scans any part of a table, it will replace
- any multixact ID it encounters which is older than
- <xref linkend="guc-vacuum-multixact-freeze-min-age"/>
- by a different value, which can be the zero value, a single
- transaction ID, or a newer multixact ID. For each table,
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest
- possible multixact ID still appearing in any tuple of that table.
- If this value is older than
- <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive
- vacuum is forced. As discussed in the previous section, an aggressive
- vacuum means that only those pages which are known to be all-frozen will
- be skipped. <function>mxid_age()</function> can be used on
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age.
+ Anything that influences when and how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advance will also directly
+ affect the high watermark storage overhead from storing a great
+ deal of historical transaction status information. The
+ additional <link linkend="vacuum-truncate-pg-xact">space
+ overhead</link> is usually of fairly minimal concern. It is
+ noted as an additional downside of allowing the system to get
+ close to <literal>xidStopLimit</literal> for the sake of
+ completeness.
</para>
- <para>
- Aggressive <command>VACUUM</command>s, regardless of what causes
- them, are <emphasis>guaranteed</emphasis> to be able to advance
- the table's <structfield>relminmxid</structfield>.
- Eventually, as all tables in all databases are scanned and their
- oldest multixact values are advanced, on-disk storage for older
- multixacts can be removed.
- </para>
+ <note>
+ <title>Historical Note</title>
+ <para>
+ The term <quote>wraparound</quote> is inaccurate. Also, there
+ is no <quote>data loss</quote> here — the message is
+ simply wrong.
+ </para>
+ <para>
+ XXX: We really need to fix the situation with single user mode
+ to put things on a good footing.
+ </para>
+ </note>
<para>
- As a safety device, an aggressive vacuum scan will
- occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds 2GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled.
+ In emergencies, <command>VACUUM</command> will take extraordinary
+ measures to avoid <literal>xidStopLimit</literal> mode. A
+ failsafe mechanism is triggered when thresholds controlled by
+ <xref linkend="guc-vacuum-failsafe-age"/> and <xref
+ linkend="guc-vacuum-multixact-failsafe-age"/> are reached. The
+ failsafe prioritizes advancing
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> as quickly as possible.
+ Once the failsafe triggers, <command>VACUUM</command> bypasses
+ all remaining non-essential maintenance tasks, and stops applying
+ any cost-based delay that was in effect. Any <glossterm
+ linkend="glossary-buffer-access-strategy">Buffer Access
+ Strategy</glossterm> in use will also be disabled.
</para>
</sect3>
</sect2>
@@ -766,6 +862,58 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect2>
+ <sect2 id="vacuum-truncate-pg-xact">
+ <title>Truncating transaction status information</title>
+ <para>
+ As noted in <xref linkend="xid-stop-limit"/>, anything that
+ influences when and how <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advance will also directly
+ affect the high watermark storage overhead needed to store
+ historical transaction status information. For example,
+ increasing <varname>autovacuum_freeze_max_age</varname> (and
+ <varname>vacuum_freeze_table_age</varname> along with it) will
+ make the <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename> subdirectories of the database
+ cluster take more space, because they store the commit status and
+ (if <varname>track_commit_timestamp</varname> is enabled)
+ timestamp of all transactions back to the
+ <varname>datfrozenxid</varname> horizon (the earliest
+ <varname>datfrozenxid</varname> in the entire cluster).
+ </para>
+ <para>
+ The commit status uses two bits per transaction. The default
+ <varname>autovacuum_freeze_max_age</varname> setting of 200
+ million transactions translates to about 50MB of
+ <filename>pg_xact</filename> storage. When
+ <varname>track_commit_timestamp</varname> is enabled, about 2GB of
+ <filename>pg_commit_ts</filename> storage will also be required.
+ </para>
+ <para>
+ MultiXactId status information is implemented as two separate
+ <acronym>SLRU</acronym> storage areas:
+ <filename>pg_multixact/members</filename>, and
+ <filename>pg_multixact/offsets</filename>. There is no simple
+ formula to determine the storage overhead per MultiXactId, since
+ MultiXactIds have a variable number of member XIDs.
+ </para>
+ <para>
+ Truncating of transaction status information is only possible at
+ the end of <command>VACUUM</command>s that advance
+ <structfield>relfrozenxid</structfield> (in the case of
+ <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename>) or
+ <structfield>relminmxid</structfield> (in the case of
+ (<filename>pg_multixact/members</filename> and
+ <filename>pg_multixact/offsets</filename>) of whatever table
+ happened to have the oldest value in the cluster when the
+ <command>VACUUM</command> began. This typically happens very
+ infrequently, often during <link
+ linkend="aggressive-strategy">aggressive strategy</link>
+ <command>VACUUM</command>s of one of the database's largest
+ tables.
+ </para>
+ </sect2>
+
<sect2 id="vacuum-for-statistics">
<title>Updating Planner Statistics</title>
@@ -881,7 +1029,7 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</tip>
</sect2>
-</sect1>
+ </sect1>
<sect1 id="routine-reindex">
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fa..8aa332fcf 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1515,7 +1515,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
and/or <command>ANALYZE</command> operations on this table following the rules
discussed in <xref linkend="autovacuum"/>.
If false, this table will not be autovacuumed, except to prevent
- transaction ID wraparound. See <xref linkend="vacuum-for-wraparound"/> for
+ transaction ID wraparound. See <xref linkend="freezing-xid-space"/> for
more about wraparound prevention.
Note that the autovacuum daemon does not run at all (except to prevent
transaction ID wraparound) if the <xref linkend="guc-autovacuum"/>
diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac..ede50d6f7 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -128,7 +128,7 @@ PREPARE TRANSACTION <replaceable class="parameter">transaction_id</replaceable>
This will interfere with the ability of <command>VACUUM</command> to reclaim
storage, and in extreme cases could cause the database to shut down
to prevent transaction ID wraparound (see <xref
- linkend="vacuum-for-wraparound"/>). Keep in mind also that the transaction
+ linkend="freezing-xid-space"/>). Keep in mind also that the transaction
continues to hold whatever locks it held. The intended usage of the
feature is that a prepared transaction will normally be committed or
rolled back as soon as an external transaction manager has verified that
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 57bc4c23e..0c28604a6 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -123,7 +123,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<term><literal>FREEZE</literal></term>
<listitem>
<para>
- Selects aggressive <quote>freezing</quote> of tuples.
+ Makes <quote>freezing</quote> <emphasis>maximally</emphasis>
+ aggressive, and forces <command>VACUUM</command> to use its
+ <link linkend="aggressive-strategy">aggressive strategy</link>.
Specifying <literal>FREEZE</literal> is equivalent to performing
<command>VACUUM</command> with the
<xref linkend="guc-vacuum-freeze-min-age"/> and
@@ -219,7 +221,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
there are many dead tuples in the table. This may be useful
when it is necessary to make <command>VACUUM</command> run as
quickly as possible to avoid imminent transaction ID wraparound
- (see <xref linkend="vacuum-for-wraparound"/>). However, the
+ (see <xref linkend="freezing-xid-space"/>). However, the
wraparound failsafe mechanism controlled by <xref
linkend="guc-vacuum-failsafe-age"/> will generally trigger
automatically to avoid transaction ID wraparound failure, and
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index da2393783..b61d523c2 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -233,7 +233,7 @@ PostgreSQL documentation
ID age of at least <replaceable class="parameter">mxid_age</replaceable>.
This setting is useful for prioritizing tables to process to prevent
multixact ID wraparound (see
- <xref linkend="vacuum-for-multixact-wraparound"/>).
+ <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the multixact ID age of a relation is
@@ -254,7 +254,7 @@ PostgreSQL documentation
transaction ID age of at least
<replaceable class="parameter">xid_age</replaceable>. This setting
is useful for prioritizing tables to process to prevent transaction
- ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ ID wraparound (see <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the transaction ID age of a relation
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index b467660ee..e18ad8fd3 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -49,7 +49,7 @@
<para>
The internal transaction ID type <type>xid</type> is 32 bits wide
- and <link linkend="vacuum-for-wraparound">wraps around</link> every
+ and <link linkend="freezing-xid-space">wraps around</link> every
4 billion transactions. A 32-bit epoch is incremented during each
wraparound. There is also a 64-bit type <type>xid8</type> which
includes this epoch and therefore does not wrap around during the
@@ -100,7 +100,7 @@
rows and can be inspected using the <xref linkend="pgrowlocks"/>
extension. Row-level read locks might also require the assignment
of multixact IDs (<literal>mxid</literal>; see <xref
- linkend="vacuum-for-multixact-wraparound"/>).
+ linkend="freezing-xid-space"/>).
</para>
</sect1>
--
2.40.0
v1-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v1-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload
From 564d28dfd8d9347affd9205f9f7fd367c9b186a4 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:44:45 -0700
Subject: [PATCH v1 6/9] Merge "basic vacuuming" sect2 into sect1 introduction.
This doesn't change any of the content itself. It just merges the
original text into the sect1 text that immediately preceded it.
This is preparation for the next commit, which will remove most of the
text "relocated" in this commit. This structure should make things a
little easier for doc translators.
This commit is the last one that could be considered mechanical
restructuring/refactoring of existing text.
---
doc/src/sgml/maintenance.sgml | 106 ++++++++++++++++------------------
1 file changed, 51 insertions(+), 55 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index f554e12bf..2e18a078a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -266,68 +266,64 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
to skim this material to help them understand and adjust autovacuuming.
</para>
- <sect2 id="vacuum-basics">
- <title>Vacuuming Basics</title>
+ <para>
+ <productname>PostgreSQL</productname>'s
+ <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
+ process each table on a regular basis for several reasons:
- <para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ <orderedlist>
+ <listitem>
+ <simpara>To recover or reuse disk space occupied by updated or deleted
+ rows.</simpara>
+ </listitem>
- <orderedlist>
- <listitem>
- <simpara>To recover or reuse disk space occupied by updated or deleted
- rows.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update the visibility map, which speeds
+ up <link linkend="indexes-index-only-scans">index-only
+ scans</link>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To update the visibility map, which speeds
- up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
+ </listitem>
+ </orderedlist>
- <listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
- </listitem>
- </orderedlist>
+ Each of these reasons dictates performing <command>VACUUM</command> operations
+ of varying frequency and scope, as explained in the following subsections.
+ </para>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
- </para>
+ <para>
+ There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
+ and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
+ disk space but runs much more slowly. Also,
+ the standard form of <command>VACUUM</command> can run in parallel with production
+ database operations. (Commands such as <command>SELECT</command>,
+ <command>INSERT</command>, <command>UPDATE</command>, and
+ <command>DELETE</command> will continue to function normally, though you
+ will not be able to modify the definition of a table with commands such as
+ <command>ALTER TABLE</command> while it is being vacuumed.)
+ <command>VACUUM FULL</command> requires an
+ <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
+ working on, and therefore cannot be done in parallel with other use
+ of the table. Generally, therefore,
+ administrators should strive to use standard <command>VACUUM</command> and
+ avoid <command>VACUUM FULL</command>.
+ </para>
- <para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
- </para>
- </sect2>
+ <para>
+ <command>VACUUM</command> creates a substantial amount of I/O
+ traffic, which can cause poor performance for other active sessions.
+ There are configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming — see
+ <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.0
v1-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchapplication/octet-stream; name=v1-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchDownload
From 9c0a390d63d262fc7b72f32f1be0d088496b8168 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:11:10 -0700
Subject: [PATCH v1 7/9] Make maintenance.sgml more autovacuum-orientated.
Now that it's no longer in its own sect2, shorten the "Vacuuming basics"
content, and make it more autovacuum-orientated. This gives much less
prominence to VACUUM FULL, which has little place in a section about
autovacuum. We no longer define avoiding the need to run VACUUM FULL as
the purpose of vacuuming.
A later commit that overhauls "Recovering Disk Space" will add back a
passing mention of things like VACUUM FULL and TRUNCATE, but only as
something that might be relevant in extreme cases. (Use of these
commands is hopefully neither "Routine" nor "Basic" to most users).
---
doc/src/sgml/maintenance.sgml | 91 +++++++++++++++++------------------
1 file changed, 44 insertions(+), 47 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2e18a078a..7476e5922 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -32,11 +32,12 @@
</para>
<para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
+ The other main category of maintenance task is periodic
+ <quote><link linkend="routine-vacuuming">vacuuming</link></quote> of
+ the database by autovacuum. Configuring autovacuum scheduling is
+ discussed in <xref linkend="autovacuum"/>. Autovacuum also updates
+ the statistics that will be used by the query planner, as discussed
+ in <xref linkend="vacuum-for-statistics"/>.
</para>
<para>
@@ -244,7 +245,7 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</sect1>
<sect1 id="routine-vacuuming">
- <title>Routine Vacuuming</title>
+ <title>Autovacuum Maintenance Tasks</title>
<indexterm zone="routine-vacuuming">
<primary>vacuum</primary>
@@ -252,24 +253,20 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<para>
<productname>PostgreSQL</productname> databases require periodic
- maintenance known as <firstterm>vacuuming</firstterm>. For many installations, it
- is sufficient to let vacuuming be performed by the <firstterm>autovacuum
- daemon</firstterm>, which is described in <xref linkend="autovacuum"/>. You might
- need to adjust the autovacuuming parameters described there to obtain best
- results for your situation. Some database administrators will want to
- supplement or replace the daemon's activities with manually-managed
- <command>VACUUM</command> commands, which typically are executed according to a
- schedule by <application>cron</application> or <application>Task
- Scheduler</application> scripts. To set up manually-managed vacuuming properly,
- it is essential to understand the issues discussed in the next few
- subsections. Administrators who rely on autovacuuming may still wish
- to skim this material to help them understand and adjust autovacuuming.
+ maintenance known as <firstterm>vacuuming</firstterm>, and require
+ periodic updates to the statistics used by the
+ <productname>PostgreSQL</productname> query planner. These
+ maintenance tasks are performed by the <link
+ linkend="sql-vacuum"><command>VACUUM</command></link> and <link
+ linkend="sql-analyze"><command>ANALYZE</command></link> commands
+ respectively. For most installations, it is sufficient to let the
+ <firstterm>autovacuum daemon</firstterm> determine when to perform
+ these maintenance tasks (which is partly determined by configurable
+ table-level thresholds; see <xref linkend="autovacuum"/>).
</para>
-
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ The autovacuum daemon has to process each table on a regular basis
+ for several reasons:
<orderedlist>
<listitem>
@@ -295,35 +292,35 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
</orderedlist>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
+ Maintenance work within the scope of items 1, 2, 3, and 4 is
+ performed by the <command>VACUUM</command> command internally.
+ Item 5 (maintenance of planner statistics) is handled by the
+ <command>ANALYZE</command> command internally. Although this
+ section presents information about autovacuum, there is no
+ difference between manually-issued <command>VACUUM</command> and
+ <command>ANALYZE</command> commands and those run by the autovacuum
+ daemon (though there are autovacuum-specific variants of a small
+ number of settings that control <command>VACUUM</command>).
</para>
-
<para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
+ Autovacuum creates a substantial amount of I/O traffic, which can
+ cause poor performance for other active sessions. There are
+ configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming. See the
+ autovacuum-specific cost delay settings described in
+ <xref linkend="runtime-config-autovacuum"/>, and additional cost
+ delay settings described in
<xref linkend="runtime-config-resource-vacuum-cost"/>.
</para>
+ <para>
+ Some database administrators will want to supplement the daemon's
+ activities with manually-managed <command>VACUUM</command>
+ commands, which typically are executed according to a schedule by
+ <application>cron</application> or <application>Task
+ Scheduler</application> scripts. It can be useful to perform
+ off-hours <command>VACUUM</command> commands during periods where
+ reduced load is expected.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.0
v1-0003-Normalize-maintenance.sgml-indentation.patchapplication/octet-stream; name=v1-0003-Normalize-maintenance.sgml-indentation.patchDownload
From 0bcfd65226d42f844455eb3e5ca7a3b5b6f61b5e Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 15:20:13 -0700
Subject: [PATCH v1 3/9] Normalize maintenance.sgml indentation.
---
doc/src/sgml/maintenance.sgml | 82 +++++++++++++++++------------------
1 file changed, 41 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 6a7ec7c1d..e8c8647cd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -11,53 +11,53 @@
<primary>routine maintenance</primary>
</indexterm>
- <para>
- <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
- be performed regularly to achieve optimum performance. The tasks
- discussed here are <emphasis>required</emphasis>, but they
- are repetitive in nature and can easily be automated using standard
- tools such as <application>cron</application> scripts or
- Windows' <application>Task Scheduler</application>. It is the database
- administrator's responsibility to set up appropriate scripts, and to
- check that they execute successfully.
- </para>
+ <para>
+ <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
+ be performed regularly to achieve optimum performance. The tasks
+ discussed here are <emphasis>required</emphasis>, but they
+ are repetitive in nature and can easily be automated using standard
+ tools such as <application>cron</application> scripts or
+ Windows' <application>Task Scheduler</application>. It is the database
+ administrator's responsibility to set up appropriate scripts, and to
+ check that they execute successfully.
+ </para>
- <para>
- One obvious maintenance task is the creation of backup copies of the data on a
- regular schedule. Without a recent backup, you have no chance of recovery
- after a catastrophe (disk failure, fire, mistakenly dropping a critical
- table, etc.). The backup and recovery mechanisms available in
- <productname>PostgreSQL</productname> are discussed at length in
- <xref linkend="backup"/>.
- </para>
+ <para>
+ One obvious maintenance task is the creation of backup copies of the data on a
+ regular schedule. Without a recent backup, you have no chance of recovery
+ after a catastrophe (disk failure, fire, mistakenly dropping a critical
+ table, etc.). The backup and recovery mechanisms available in
+ <productname>PostgreSQL</productname> are discussed at length in
+ <xref linkend="backup"/>.
+ </para>
- <para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
- </para>
+ <para>
+ The other main category of maintenance task is periodic <quote>vacuuming</quote>
+ of the database. This activity is discussed in
+ <xref linkend="routine-vacuuming"/>. Closely related to this is updating
+ the statistics that will be used by the query planner, as discussed in
+ <xref linkend="vacuum-for-statistics"/>.
+ </para>
- <para>
- Another task that might need periodic attention is log file management.
- This is discussed in <xref linkend="logfile-maintenance"/>.
- </para>
+ <para>
+ Another task that might need periodic attention is log file management.
+ This is discussed in <xref linkend="logfile-maintenance"/>.
+ </para>
- <para>
- <ulink
+ <para>
+ <ulink
url="https://bucardo.org/check_postgres/"><application>check_postgres</application></ulink>
- is available for monitoring database health and reporting unusual
- conditions. <application>check_postgres</application> integrates with
- Nagios and MRTG, but can be run standalone too.
- </para>
+ is available for monitoring database health and reporting unusual
+ conditions. <application>check_postgres</application> integrates with
+ Nagios and MRTG, but can be run standalone too.
+ </para>
- <para>
- <productname>PostgreSQL</productname> is low-maintenance compared
- to some other database management systems. Nonetheless,
- appropriate attention to these tasks will go far towards ensuring a
- pleasant and productive experience with the system.
- </para>
+ <para>
+ <productname>PostgreSQL</productname> is low-maintenance compared
+ to some other database management systems. Nonetheless,
+ appropriate attention to these tasks will go far towards ensuring a
+ pleasant and productive experience with the system.
+ </para>
<sect1 id="autovacuum">
<title>The Autovacuum Daemon</title>
--
2.40.0
v1-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v1-0004-Reorder-routine-vacuuming-sections.patchDownload
From 6522640d8378b1c8d37631e0cdf44ce4ad394f1f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:19:50 -0700
Subject: [PATCH v1 4/9] Reorder routine vacuuming sections.
This doesn't change any of the content itself. It is a mechanical
change. The new order flows better because it talks about freezing
directly after talking about space recovery tasks.
Old order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-statistics">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-wraparound">
New order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-wraparound">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-statistics">
The new order matches processing order inside vacuumlazy.c. This order
will be easier to work with in two later commits that more or less
rewrite "vacuum-for-wraparound" and "vacuum-for-space-recovery".
(Though it doesn't seem to make the existing content any less meaningful
without the later rewrite commits.)
---
doc/src/sgml/maintenance.sgml | 306 +++++++++++++++++-----------------
1 file changed, 155 insertions(+), 151 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e8c8647cd..62e22d861 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -281,8 +281,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
</listitem>
<listitem>
@@ -292,9 +293,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
</listitem>
</orderedlist>
@@ -439,151 +439,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-statistics">
- <title>Updating Planner Statistics</title>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>statistics</primary>
- <secondary>of the planner</secondary>
- </indexterm>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>ANALYZE</primary>
- </indexterm>
-
- <para>
- The <productname>PostgreSQL</productname> query planner relies on
- statistical information about the contents of tables in order to
- generate good plans for queries. These statistics are gathered by
- the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
- which can be invoked by itself or
- as an optional step in <command>VACUUM</command>. It is important to have
- reasonably accurate statistics, otherwise poor choices of plans might
- degrade database performance.
- </para>
-
- <para>
- The autovacuum daemon, if enabled, will automatically issue
- <command>ANALYZE</command> commands whenever the content of a table has
- changed sufficiently. However, administrators might prefer to rely
- on manually-scheduled <command>ANALYZE</command> operations, particularly
- if it is known that update activity on a table will not affect the
- statistics of <quote>interesting</quote> columns. The daemon schedules
- <command>ANALYZE</command> strictly as a function of the number of rows
- inserted or updated; it has no knowledge of whether that will lead
- to meaningful statistical changes.
- </para>
-
- <para>
- Tuples changed in partitions and inheritance children do not trigger
- analyze on the parent table. If the parent table is empty or rarely
- changed, it may never be processed by autovacuum, and the statistics for
- the inheritance tree as a whole won't be collected. It is necessary to
- run <command>ANALYZE</command> on the parent table manually in order to
- keep the statistics up to date.
- </para>
-
- <para>
- As with vacuuming for space recovery, frequent updates of statistics
- are more useful for heavily-updated tables than for seldom-updated
- ones. But even for a heavily-updated table, there might be no need for
- statistics updates if the statistical distribution of the data is
- not changing much. A simple rule of thumb is to think about how much
- the minimum and maximum values of the columns in the table change.
- For example, a <type>timestamp</type> column that contains the time
- of row update will have a constantly-increasing maximum value as
- rows are added and updated; such a column will probably need more
- frequent statistics updates than, say, a column containing URLs for
- pages accessed on a website. The URL column might receive changes just
- as often, but the statistical distribution of its values probably
- changes relatively slowly.
- </para>
-
- <para>
- It is possible to run <command>ANALYZE</command> on specific tables and even
- just specific columns of a table, so the flexibility exists to update some
- statistics more frequently than others if your application requires it.
- In practice, however, it is usually best to just analyze the entire
- database, because it is a fast operation. <command>ANALYZE</command> uses a
- statistically random sampling of the rows of a table rather than reading
- every single row.
- </para>
-
- <tip>
- <para>
- Although per-column tweaking of <command>ANALYZE</command> frequency might not be
- very productive, you might find it worthwhile to do per-column
- adjustment of the level of detail of the statistics collected by
- <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
- clauses and have highly irregular data distributions might require a
- finer-grain data histogram than other columns. See <command>ALTER TABLE
- SET STATISTICS</command>, or change the database-wide default using the <xref
- linkend="guc-default-statistics-target"/> configuration parameter.
- </para>
-
- <para>
- Also, by default there is limited information available about
- the selectivity of functions. However, if you create a statistics
- object or an expression
- index that uses a function call, useful statistics will be
- gathered about the function, which can greatly improve query
- plans that use the expression index.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands for
- foreign tables, since it has no means of determining how often that
- might be useful. If your queries require statistics on foreign tables
- for proper planning, it's a good idea to run manually-managed
- <command>ANALYZE</command> commands on those tables on a suitable schedule.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands
- for partitioned tables. Inheritance parents will only be analyzed if the
- parent itself is changed - changes to child tables do not trigger
- autoanalyze on the parent table. If your queries require statistics on
- parent tables for proper planning, it is necessary to periodically run
- a manual <command>ANALYZE</command> on those tables to keep the statistics
- up to date.
- </para>
- </tip>
-
- </sect2>
-
- <sect2 id="vacuum-for-visibility-map">
- <title>Updating the Visibility Map</title>
-
- <para>
- Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
- table to keep track of which pages contain only tuples that are known to be
- visible to all active transactions (and all future transactions, until the
- page is again modified). This has two purposes. First, vacuum
- itself can skip such pages on the next run, since there is nothing to
- clean up.
- </para>
-
- <para>
- Second, it allows <productname>PostgreSQL</productname> to answer some
- queries using only the index, without reference to the underlying table.
- Since <productname>PostgreSQL</productname> indexes don't contain tuple
- visibility information, a normal index scan fetches the heap tuple for each
- matching index entry, to check whether it should be seen by the current
- transaction.
- An <link linkend="indexes-index-only-scans"><firstterm>index-only
- scan</firstterm></link>, on the other hand, checks the visibility map first.
- If it's known that all tuples on the page are
- visible, the heap fetch can be skipped. This is most useful on
- large data sets where the visibility map can prevent disk accesses.
- The visibility map is vastly smaller than the heap, so it can easily be
- cached even when the heap is very large.
- </para>
- </sect2>
-
<sect2 id="vacuum-for-wraparound">
<title>Preventing Transaction ID Wraparound Failures</title>
@@ -933,7 +788,156 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
- </sect1>
+
+ <sect2 id="vacuum-for-visibility-map">
+ <title>Updating the Visibility Map</title>
+
+ <para>
+ Vacuum maintains a <link linkend="storage-vm">visibility
+ map</link> for each table to keep track of which pages contain
+ only tuples that are known to be visible to all active
+ transactions (and all future transactions, until the page is again
+ modified). This has two purposes. First, vacuum itself can skip
+ such pages on the next run, since there is nothing to clean up.
+ Even <command>VACUUM</command>s that use the <link
+ linkend="aggressive-strategy">aggressive strategy</link> can skip
+ pages that are both all-visible and all-frozen (the visibility map
+ keeps track of which pages are all-frozen separately).
+ </para>
+
+ <para>
+ Second, it allows <productname>PostgreSQL</productname> to answer
+ some queries using only the index, without reference to the
+ underlying table. Since <productname>PostgreSQL</productname>
+ indexes don't contain tuple visibility information, a normal index
+ scan fetches the heap tuple for each matching index entry, to
+ check whether it should be seen by the current transaction. An
+ <link linkend="indexes-index-only-scans"><firstterm>index-only
+ scan</firstterm></link>, on the other hand, checks the
+ visibility map first. If it's known that all tuples on the page
+ are visible, the heap fetch can be skipped. This is most useful
+ on large data sets where the visibility map can prevent disk
+ accesses. The visibility map is vastly smaller than the heap, so
+ it can easily be cached even when the heap is very large.
+ </para>
+ </sect2>
+
+ <sect2 id="vacuum-for-statistics">
+ <title>Updating Planner Statistics</title>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>statistics</primary>
+ <secondary>of the planner</secondary>
+ </indexterm>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>ANALYZE</primary>
+ </indexterm>
+
+ <para>
+ The <productname>PostgreSQL</productname> query planner relies on
+ statistical information about the contents of tables in order to
+ generate good plans for queries. These statistics are gathered by
+ the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
+ which can be invoked by itself or
+ as an optional step in <command>VACUUM</command>. It is important to have
+ reasonably accurate statistics, otherwise poor choices of plans might
+ degrade database performance.
+ </para>
+
+ <para>
+ The autovacuum daemon, if enabled, will automatically issue
+ <command>ANALYZE</command> commands whenever the content of a table has
+ changed sufficiently. However, administrators might prefer to rely
+ on manually-scheduled <command>ANALYZE</command> operations, particularly
+ if it is known that update activity on a table will not affect the
+ statistics of <quote>interesting</quote> columns. The daemon schedules
+ <command>ANALYZE</command> strictly as a function of the number of rows
+ inserted or updated; it has no knowledge of whether that will lead
+ to meaningful statistical changes.
+ </para>
+
+ <para>
+ Tuples changed in partitions and inheritance children do not trigger
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum, and the statistics for
+ the inheritance tree as a whole won't be collected. It is necessary to
+ run <command>ANALYZE</command> on the parent table manually in order to
+ keep the statistics up to date.
+ </para>
+
+ <para>
+ As with vacuuming for space recovery, frequent updates of statistics
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there might be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
+ the minimum and maximum values of the columns in the table change.
+ For example, a <type>timestamp</type> column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column might receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
+ </para>
+
+ <para>
+ It is possible to run <command>ANALYZE</command> on specific tables and even
+ just specific columns of a table, so the flexibility exists to update some
+ statistics more frequently than others if your application requires it.
+ In practice, however, it is usually best to just analyze the entire
+ database, because it is a fast operation. <command>ANALYZE</command> uses a
+ statistically random sampling of the rows of a table rather than reading
+ every single row.
+ </para>
+
+ <tip>
+ <para>
+ Although per-column tweaking of <command>ANALYZE</command> frequency might not be
+ very productive, you might find it worthwhile to do per-column
+ adjustment of the level of detail of the statistics collected by
+ <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
+ clauses and have highly irregular data distributions might require a
+ finer-grain data histogram than other columns. See <command>ALTER TABLE
+ SET STATISTICS</command>, or change the database-wide default using the <xref
+ linkend="guc-default-statistics-target"/> configuration parameter.
+ </para>
+
+ <para>
+ Also, by default there is limited information available about
+ the selectivity of functions. However, if you create a statistics
+ object or an expression
+ index that uses a function call, useful statistics will be
+ gathered about the function, which can greatly improve query
+ plans that use the expression index.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ foreign tables, since it has no means of determining how often that
+ might be useful. If your queries require statistics on foreign tables
+ for proper planning, it's a good idea to run manually-managed
+ <command>ANALYZE</command> commands on those tables on a suitable schedule.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands
+ for partitioned tables. Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table. If your queries require statistics on
+ parent tables for proper planning, it is necessary to periodically run
+ a manual <command>ANALYZE</command> on those tables to keep the statistics
+ up to date.
+ </para>
+ </tip>
+
+ </sect2>
+</sect1>
<sect1 id="routine-reindex">
--
2.40.0
v1-0002-Restructure-autuovacuum-daemon-section.patchapplication/octet-stream; name=v1-0002-Restructure-autuovacuum-daemon-section.patchDownload
From 7ee4d3ab59559c3b31f9408623e9ad5d59514aaa Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 24 Apr 2023 09:21:01 -0700
Subject: [PATCH v1 2/9] Restructure autuovacuum daemon section.
Add sect2/sect3 subsections to autovacuum sect1. Also reorder the
content slightly for clarity.
TODO Add some basic explanations of vacuuming and relfrozenxid
advancement, since that now appears later on in the chapter.
Alternatively, move the autovacuum daemon sect1 after the routine
vacuuming sect1.
---
doc/src/sgml/maintenance.sgml | 66 ++++++++++++++++++++++-------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index a6295c399..6a7ec7c1d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -100,6 +100,8 @@
autovacuum workers' activity.
</para>
+ <sect2 id="autovacuum-scheduling">
+ <title>Autovacuum Scheduling</title>
<para>
If several large tables all become eligible for vacuuming in a short
amount of time, all autovacuum workers might become occupied with
@@ -112,6 +114,8 @@
<xref linkend="guc-superuser-reserved-connections"/> limits.
</para>
+ <sect3 id="autovacuum-vacuum-thresholds">
+ <title>Configurable thresholds for vacuuming</title>
<para>
Tables whose <structfield>relfrozenxid</structfield> value is more than
<xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
@@ -159,7 +163,10 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
since the last vacuum are scanned.
</para>
+ </sect3>
+ <sect3 id="autovacuum-analyze-thresholds">
+ <title>Configurable thresholds for <command>ANALYZE</command></title>
<para>
For analyze, a similar condition is used: the threshold, defined as:
<programlisting>
@@ -168,20 +175,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
</para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
<para>
The default thresholds and scale factors are taken from
<filename>postgresql.conf</filename>, but it is possible to override them
@@ -192,18 +185,25 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
used. See <xref linkend="runtime-config-autovacuum"/> for more details on
the global settings.
</para>
+ </sect3>
+ </sect2>
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
+ <sect2 id="autovacuum-cost-delays">
+ <title>Autovacuum Cost-based Delays</title>
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+ </sect2>
+ <sect2 id="autovacuum-lock-conflicts">
+ <title>Autovacuum and Lock Conflicts</title>
<para>
Autovacuum workers generally don't block other commands. If a process
attempts to acquire a lock that conflicts with the
@@ -223,6 +223,24 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
effectively prevent autovacuums from ever completing.
</para>
</warning>
+ </sect2>
+
+ <sect2 id="autovacuum-limitations">
+ <title>Limitations</title>
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+ </sect2>
+
</sect1>
<sect1 id="routine-vacuuming">
--
2.40.0
v1-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v1-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload
From a66f93e6bc90bd95e77b2fa923d8c6151e834bb0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:41:00 -0700
Subject: [PATCH v1 5/9] Move Interpreting XID stamps from tuple headers.
This is intended to be fairly close to a mechanical change. It isn't
entirely mechanical, though, since the original wording has been
slightly modified for it to work in context.
Structuring things this way should make life a little easier for doc
translators.
---
doc/src/sgml/maintenance.sgml | 81 +++++++----------------------------
doc/src/sgml/storage.sgml | 62 +++++++++++++++++++++++++++
2 files changed, 78 insertions(+), 65 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 62e22d861..f554e12bf 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -447,75 +447,26 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<secondary>wraparound</secondary>
</indexterm>
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
- </indexterm>
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs</secondary>
+ </indexterm>
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="mvcc-intro">MVCC</link> transaction semantics
- depend on being able to compare transaction ID (<acronym>XID</acronym>)
- numbers: a row version with an insertion XID greater than the current
- transaction's XID is <quote>in the future</quote> and should not be visible
- to the current transaction. But since transaction IDs have limited size
- (32 bits) a cluster that runs for a long time (more
- than 4 billion transactions) would suffer <firstterm>transaction ID
- wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden
- transactions that were in the past appear to be in the future — which
- means their output become invisible. In short, catastrophic data loss.
- (Actually the data is still there, but that's cold comfort if you cannot
- get at it.) To avoid this, it is necessary to vacuum every table
- in every database at least once every two billion transactions.
+ <productname>PostgreSQL</productname>'s <link
+ linkend="mvcc-intro">MVCC</link> transaction semantics depend on
+ being able to compare <glossterm linkend="glossary-xid">transaction
+ ID numbers (<acronym>XID</acronym>)</glossterm> to determine
+ whether or not the row is visible to each query's MVCC snapshot
+ (see <link linkend="interpreting-xid-stamps">
+ interpreting XID stamps from tuple headers</link>). But since
+ on-disk storage of transaction IDs in heap pages uses a truncated
+ 32-bit representation to save space (rather than the full 64-bit
+ representation), it is necessary to vacuum every table in every
+ database <emphasis>at least</emphasis> once every two billion
+ transactions (though far more frequent vacuuming is typical).
</para>
- <para>
- The reason that periodic vacuuming solves the problem is that
- <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
- they were inserted by a transaction that committed sufficiently far in
- the past that the effects of the inserting transaction are certain to be
- visible to all current and future transactions.
- Normal XIDs are
- compared using modulo-2<superscript>32</superscript> arithmetic. This means
- that for every normal XID, there are two billion XIDs that are
- <quote>older</quote> and two billion that are <quote>newer</quote>; another
- way to say it is that the normal XID space is circular with no
- endpoint. Therefore, once a row version has been created with a particular
- normal XID, the row version will appear to be <quote>in the past</quote> for
- the next two billion transactions, no matter which normal XID we are
- talking about. If the row version still exists after more than two billion
- transactions, it will suddenly appear to be in the future. To
- prevent this, <productname>PostgreSQL</productname> reserves a special XID,
- <literal>FrozenTransactionId</literal>, which does not follow the normal XID
- comparison rules and is always considered older
- than every normal XID.
- Frozen row versions are treated as if the inserting XID were
- <literal>FrozenTransactionId</literal>, so that they will appear to be
- <quote>in the past</quote> to all normal transactions regardless of wraparound
- issues, and so such row versions will be valid until deleted, no matter
- how long that is.
- </para>
-
- <note>
- <para>
- In <productname>PostgreSQL</productname> versions before 9.4, freezing was
- implemented by actually replacing a row's insertion XID
- with <literal>FrozenTransactionId</literal>, which was visible in the
- row's <structname>xmin</structname> system column. Newer versions just set a flag
- bit, preserving the row's original <structname>xmin</structname> for possible
- forensic use. However, rows with <structname>xmin</structname> equal
- to <literal>FrozenTransactionId</literal> (2) may still be found
- in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
- </para>
- <para>
- Also, system catalogs may contain rows with <structname>xmin</structname> equal
- to <literal>BootstrapTransactionId</literal> (1), indicating that they were
- inserted during the first phase of <application>initdb</application>.
- Like <literal>FrozenTransactionId</literal>, this special XID is treated as
- older than every normal XID.
- </para>
- </note>
-
<para>
<xref linkend="guc-vacuum-freeze-min-age"/>
controls how old an XID value has to be before rows bearing that XID will be
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index e5b9f3f1f..f31a002fc 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -1072,6 +1072,68 @@ data. Empty in ordinary tables.</entry>
it might be compressed, too (see <xref linkend="storage-toast"/>).
</para>
+
+ <sect3 id="interpreting-xid-stamps">
+ <title>Interpreting XID stamps from tuple headers</title>
+
+ <para>
+ The on-disk representation of transaction IDs (the representation
+ used in <structfield>t_xmin</structfield> and
+ <structfield>t_xmax</structfield> fields) use a truncated 32-bit
+ representation of transaction IDs, not the full 64-bit
+ representation. This is not suitable for long term storage without
+ special processing by <command>VACUUM</command>.
+ </para>
+
+ <para>
+ <command>VACUUM</command> <link linkend="routine-vacuuming">will
+ mark tuple headers <emphasis>frozen</emphasis></link>, indicating
+ that all eligible rows on the page were inserted by a transaction
+ that committed sufficiently far in the past that the effects of the
+ inserting transaction are certain to be visible to all current and
+ future transactions. Normal XIDs are compared using
+ modulo-2<superscript>32</superscript> arithmetic. This means that
+ for every normal XID, there are two billion XIDs that are
+ <quote>older</quote> and two billion that are <quote>newer</quote>;
+ another way to say it is that the normal XID space is circular with
+ no endpoint. Therefore, once a row version has been created with a
+ particular normal XID, the row version will appear to be <quote>in
+ the past</quote> for the next two billion transactions, no matter
+ which normal XID we are talking about. If the row version still
+ exists after more than two billion transactions, it will suddenly
+ appear to be in the future. To prevent this,
+ <productname>PostgreSQL</productname> reserves a special XID,
+ <literal>FrozenTransactionId</literal>, which does not follow the
+ normal XID comparison rules and is always considered older than
+ every normal XID. Frozen row versions are treated as if the
+ inserting XID were <literal>FrozenTransactionId</literal>, so that
+ they will appear to be <quote>in the past</quote> to all normal
+ transactions regardless of wraparound issues, and so such row
+ versions will be valid until deleted, no matter how long that is.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 9.4, freezing was
+ implemented by actually replacing a row's insertion XID
+ with <literal>FrozenTransactionId</literal>, which was visible in the
+ row's <structname>xmin</structname> system column. Newer versions just set a flag
+ bit, preserving the row's original <structname>xmin</structname> for possible
+ forensic use. However, rows with <structname>xmin</structname> equal
+ to <literal>FrozenTransactionId</literal> (2) may still be found
+ in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
+ </para>
+ <para>
+ Also, system catalogs may contain rows with <structname>xmin</structname> equal
+ to <literal>BootstrapTransactionId</literal> (1), indicating that they were
+ inserted during the first phase of <application>initdb</application>.
+ Like <literal>FrozenTransactionId</literal>, this special XID is treated as
+ older than every normal XID.
+ </para>
+ </note>
+
+</sect3>
+
</sect2>
</sect1>
--
2.40.0
v1-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v1-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload
From 19f968d5302d25af67517a8d1bad8e961e11af6a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 12 Apr 2023 14:42:06 -0700
Subject: [PATCH v1 1/9] Make autovacuum docs into a sect1 of its own.
This doesn't change any of the content itself.
---
doc/src/sgml/maintenance.sgml | 332 +++++++++++++++++-----------------
1 file changed, 166 insertions(+), 166 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a..a6295c399 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -59,6 +59,172 @@
pleasant and productive experience with the system.
</para>
+ <sect1 id="autovacuum">
+ <title>The Autovacuum Daemon</title>
+
+ <indexterm>
+ <primary>autovacuum</primary>
+ <secondary>general information</secondary>
+ </indexterm>
+ <para>
+ <productname>PostgreSQL</productname> has an optional but highly
+ recommended feature called <firstterm>autovacuum</firstterm>,
+ whose purpose is to automate the execution of
+ <command>VACUUM</command> and <command>ANALYZE</command> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <xref
+ linkend="guc-track-counts"/> is set to <literal>true</literal>.
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </para>
+
+ <para>
+ The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+ <firstterm>autovacuum worker</firstterm> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+ seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
+ a new worker will be launched every
+ <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+ A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+ are allowed to run at the same time. If there are more than
+ <varname>autovacuum_max_workers</varname> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+ <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+ autovacuum workers' activity.
+ </para>
+
+ <para>
+ If several large tables all become eligible for vacuuming in a short
+ amount of time, all autovacuum workers might become occupied with
+ vacuuming those tables for a long period. This would result
+ in other tables and databases not being vacuumed until a worker becomes
+ available. There is no limit on how many workers might be in a
+ single database, but workers do try to avoid repeating work that has
+ already been done by other workers. Note that the number of running
+ workers does not count towards <xref linkend="guc-max-connections"/> or
+ <xref linkend="guc-superuser-reserved-connections"/> limits.
+ </para>
+
+ <para>
+ Tables whose <structfield>relfrozenxid</structfield> value is more than
+ <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+ table is vacuumed. The vacuum threshold is defined as:
+<programlisting>
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</programlisting>
+ where the vacuum base threshold is
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ the vacuum scale factor is
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+ and the number of tuples is
+ <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+ </para>
+
+ <para>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
+<programlisting>
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</programlisting>
+ where the vacuum insert base threshold is
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+ and vacuum insert scale factor is
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+ Such vacuums may allow portions of the table to be marked as
+ <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <command>INSERT</command> operations but no or
+ almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+ it may be beneficial to lower the table's
+ <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the cumulative statistics system;
+ it is a semi-accurate count updated by each <command>UPDATE</command>,
+ <command>DELETE</command> and <command>INSERT</command> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <structfield>relfrozenxid</structfield> value of the table
+ is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </para>
+
+ <para>
+ For analyze, a similar condition is used: the threshold, defined as:
+<programlisting>
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</programlisting>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <command>ANALYZE</command>.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+
+ <para>
+ The default thresholds and scale factors are taken from
+ <filename>postgresql.conf</filename>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <xref linkend="sql-createtable-storage-parameters"/> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+ the global settings.
+ </para>
+
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+
+ <para>
+ Autovacuum workers generally don't block other commands. If a process
+ attempts to acquire a lock that conflicts with the
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+ acquisition will interrupt the autovacuum. For conflicting lock modes,
+ see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
+ is running to prevent transaction ID wraparound (i.e., the autovacuum query
+ name in the <structname>pg_stat_activity</structname> view ends with
+ <literal>(to prevent wraparound)</literal>), the autovacuum is not
+ automatically interrupted.
+ </para>
+
+ <warning>
+ <para>
+ Regularly running commands that acquire locks conflicting with a
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+ effectively prevent autovacuums from ever completing.
+ </para>
+ </warning>
+ </sect1>
+
<sect1 id="routine-vacuuming">
<title>Routine Vacuuming</title>
@@ -749,172 +915,6 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
-
- <sect2 id="autovacuum">
- <title>The Autovacuum Daemon</title>
-
- <indexterm>
- <primary>autovacuum</primary>
- <secondary>general information</secondary>
- </indexterm>
- <para>
- <productname>PostgreSQL</productname> has an optional but highly
- recommended feature called <firstterm>autovacuum</firstterm>,
- whose purpose is to automate the execution of
- <command>VACUUM</command> and <command>ANALYZE</command> commands.
- When enabled, autovacuum checks for
- tables that have had a large number of inserted, updated or deleted
- tuples. These checks use the statistics collection facility;
- therefore, autovacuum cannot be used unless <xref
- linkend="guc-track-counts"/> is set to <literal>true</literal>.
- In the default configuration, autovacuuming is enabled and the related
- configuration parameters are appropriately set.
- </para>
-
- <para>
- The <quote>autovacuum daemon</quote> actually consists of multiple processes.
- There is a persistent daemon process, called the
- <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
- <firstterm>autovacuum worker</firstterm> processes for all databases. The
- launcher will distribute the work across time, attempting to start one
- worker within each database every <xref linkend="guc-autovacuum-naptime"/>
- seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
- a new worker will be launched every
- <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
- A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
- are allowed to run at the same time. If there are more than
- <varname>autovacuum_max_workers</varname> databases to be processed,
- the next database will be processed as soon as the first worker finishes.
- Each worker process will check each table within its database and
- execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
- <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
- autovacuum workers' activity.
- </para>
-
- <para>
- If several large tables all become eligible for vacuuming in a short
- amount of time, all autovacuum workers might become occupied with
- vacuuming those tables for a long period. This would result
- in other tables and databases not being vacuumed until a worker becomes
- available. There is no limit on how many workers might be in a
- single database, but workers do try to avoid repeating work that has
- already been done by other workers. Note that the number of running
- workers does not count towards <xref linkend="guc-max-connections"/> or
- <xref linkend="guc-superuser-reserved-connections"/> limits.
- </para>
-
- <para>
- Tables whose <structfield>relfrozenxid</structfield> value is more than
- <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
- vacuumed (this also applies to those tables whose freeze max age has
- been modified via storage parameters; see below). Otherwise, if the
- number of tuples obsoleted since the last
- <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
- table is vacuumed. The vacuum threshold is defined as:
-<programlisting>
-vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
-</programlisting>
- where the vacuum base threshold is
- <xref linkend="guc-autovacuum-vacuum-threshold"/>,
- the vacuum scale factor is
- <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
- and the number of tuples is
- <structname>pg_class</structname>.<structfield>reltuples</structfield>.
- </para>
-
- <para>
- The table is also vacuumed if the number of tuples inserted since the last
- vacuum has exceeded the defined insert threshold, which is defined as:
-<programlisting>
-vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
-</programlisting>
- where the vacuum insert base threshold is
- <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
- and vacuum insert scale factor is
- <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
- Such vacuums may allow portions of the table to be marked as
- <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
- it is a semi-accurate count updated by each <command>UPDATE</command>,
- <command>DELETE</command> and <command>INSERT</command> operation. (It is
- only semi-accurate because some information might be lost under heavy
- load.) If the <structfield>relfrozenxid</structfield> value of the table
- is more than <varname>vacuum_freeze_table_age</varname> transactions old,
- an aggressive vacuum is performed to freeze old tuples and advance
- <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
- since the last vacuum are scanned.
- </para>
-
- <para>
- For analyze, a similar condition is used: the threshold, defined as:
-<programlisting>
-analyze threshold = analyze base threshold + analyze scale factor * number of tuples
-</programlisting>
- is compared to the total number of tuples inserted, updated, or deleted
- since the last <command>ANALYZE</command>.
- </para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
- <para>
- The default thresholds and scale factors are taken from
- <filename>postgresql.conf</filename>, but it is possible to override them
- (and many other autovacuum control parameters) on a per-table basis; see
- <xref linkend="sql-createtable-storage-parameters"/> for more information.
- If a setting has been changed via a table's storage parameters, that value
- is used when processing that table; otherwise the global settings are
- used. See <xref linkend="runtime-config-autovacuum"/> for more details on
- the global settings.
- </para>
-
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
-
- <para>
- Autovacuum workers generally don't block other commands. If a process
- attempts to acquire a lock that conflicts with the
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
- acquisition will interrupt the autovacuum. For conflicting lock modes,
- see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
- is running to prevent transaction ID wraparound (i.e., the autovacuum query
- name in the <structname>pg_stat_activity</structname> view ends with
- <literal>(to prevent wraparound)</literal>), the autovacuum is not
- automatically interrupted.
- </para>
-
- <warning>
- <para>
- Regularly running commands that acquire locks conflicting with a
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
- effectively prevent autovacuums from ever completing.
- </para>
- </warning>
- </sect2>
</sect1>
--
2.40.0
On Tue, Apr 25, 2023 at 4:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Now is a great time to revise this section, in my view. (I myself am about
ready to get back to testing and writing for the task of removing that
"obnoxious hint".)
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
I believe the high-level direction is sound, and some details have been
discussed before.
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.This flows a lot better, which helps with later items that deal with
freezing/wraparound.
Seems logical.
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
+1
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.
It does seem to be an excessive level of detail for this chapter, so +1.
Speaking of excessive detail, however...(skipping ahead)
+ <note>
+ <para>
+ There is no fundamental difference between a
+ <command>VACUUM</command> run during anti-wraparound
+ autovacuuming and a <command>VACUUM</command> that happens to
+ use the aggressive strategy (whether run by autovacuum or
+ manually issued).
+ </para>
+ </note>
I don't see the value of this, from the user's perspective, of mentioning
this at all, much less for it to be called out as a Note. Imagine a user
who has been burnt by non-cancellable vacuums. How would they interpret
this statement?
It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"
Hah!
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).
I have no strong opinion on that.
5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".
+1
6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
Seems more accurate. On top of that, "Routine vacuuming" slightly implies
manual vacuums.
I've only taken a cursory look, but will look more closely as time permits.
(Side note: My personal preference for rough doc patches would be to leave
out spurious whitespace changes. That not only includes indentation, but
also paragraphs where many of the words haven't changed at all, but every
line has changed to keep the paragraph tidy. Seems like more work for both
the author and the reviewer.)
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing for the task of removing that "obnoxious hint".)
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)
This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?I believe the high-level direction is sound, and some details have been discussed before.
I'm relieved that you think so. I was a bit worried that I'd get
bogged down, having already invested a lot of time in this.
Attached is v2. It has the same high level direction as v1, but is a
lot more polished. Still not committable, to be sure. But better than
v1.
I'm also attaching a prebuilt copy of routine-vacuuming.html, as with
v1 -- hopefully that's helpful.
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.It does seem to be an excessive level of detail for this chapter, so +1. Speaking of excessive detail, however...(skipping ahead)
My primary objection to talking about modulo-2^32 stuff first is not
that it's an excessive amount of detail (though it definitely is). My
objection is that it places emphasis on exactly the thing that *isn't*
supposed to matter, under the design of freezing -- greatly confusing
the reader (even sophisticated readers). Discussion of so-called
wraparound should start with logical concepts, such as xmin XIDs being
treated as "infinitely far in the past" once frozen. The physical data
structures do matter too, but even there the emphasis should be on
heap pages being "self-contained", in the sense that SQL queries won't
need to access pg_xact to read the rows from the pages going forward
(even on standbys).
Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)
+ <note> + <para> + There is no fundamental difference between a + <command>VACUUM</command> run during anti-wraparound + autovacuuming and a <command>VACUUM</command> that happens to + use the aggressive strategy (whether run by autovacuum or + manually issued). + </para> + </note>I don't see the value of this, from the user's perspective, of mentioning this at all, much less for it to be called out as a Note. Imagine a user who has been burnt by non-cancellable vacuums. How would they interpret this statement?
I meant that it isn't special from the point of view of vacuumlazy.c.
I do see your point, though. I've taken that out in v2.
(I happen to believe that the antiwraparound autocancellation behavior
is very unhelpful as currently implemented, which biased my view of
this.)
4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).I have no strong opinion on that.
Most of the time, when antiwraparound autovacuums are triggered by
autovacuum_multixact_freeze_max_age, in a way that is noticeable (say
a large table), VACUUM will in all likelihood end up processing
exactly 0 multis. What you'll get is pretty much an "early" aggressive
VACUUM, which isn't such a big deal (especially with page-level
freezing). You can already get an "early" aggressive VACUUM due to
hitting vacuum_freeze_table_age before autovacuum_freeze_max_age is
ever reached (in fact it's the common case, now that we have
insert-driven autovacuums).
So I'm trying to suggest that an aggressive VACUUM is the same
regardless of the trigger condition. To a lesser extent, I'm trying to
make the user aware that the mechanical difference between aggressive
and non-aggressive is fairly minor, even if the consequences of that
difference are quite noticeable. (Though maybe they're less noticeable
with the v16 work in place.)
I've only taken a cursory look, but will look more closely as time permits.
I would really appreciate that. This is not easy work.
I suspect that the docs talk about wraparound using extremely alarming
language possible because at one point it really was necessary to
scare users into running VACUUM to avoid data loss. This was before
autovacuum, and before the invention of vxids, and even before the
invention of freezing. It was up to you as a user to VACUUM your
database using cron, and if you didn't then eventually data loss could
result.
Obviously these docs were updated many times over the years, but I
maintain that the basic structure from 20 years ago is still present
in a way that it really shouldn't be.
(Side note: My personal preference for rough doc patches would be to leave out spurious whitespace changes.
I've tried to keep them out (or at least break the noisy whitespace
changes out into their own commit). I might have missed a few of them
in v1, which are fixed in v2.
Thanks
--
Peter Geoghegan
Attachments:
v2-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v2-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload
From 151b4d6b39880c5265184dcc20f52fb8a2e8b6e1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:44:45 -0700
Subject: [PATCH v2 6/9] Merge "basic vacuuming" sect2 into sect1 introduction.
This doesn't change any of the content itself. It just merges the
original text into the sect1 text that immediately preceded it.
This is preparation for the next commit, which will remove most of the
text "relocated" in this commit. This structure should make things a
little easier for doc translators.
This commit is the last one that could be considered mechanical
restructuring/refactoring of existing text.
---
doc/src/sgml/maintenance.sgml | 106 ++++++++++++++++------------------
1 file changed, 51 insertions(+), 55 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index f554e12bf..2e18a078a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -266,68 +266,64 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
to skim this material to help them understand and adjust autovacuuming.
</para>
- <sect2 id="vacuum-basics">
- <title>Vacuuming Basics</title>
+ <para>
+ <productname>PostgreSQL</productname>'s
+ <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
+ process each table on a regular basis for several reasons:
- <para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ <orderedlist>
+ <listitem>
+ <simpara>To recover or reuse disk space occupied by updated or deleted
+ rows.</simpara>
+ </listitem>
- <orderedlist>
- <listitem>
- <simpara>To recover or reuse disk space occupied by updated or deleted
- rows.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update the visibility map, which speeds
+ up <link linkend="indexes-index-only-scans">index-only
+ scans</link>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To update the visibility map, which speeds
- up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
+ </listitem>
+ </orderedlist>
- <listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
- </listitem>
- </orderedlist>
+ Each of these reasons dictates performing <command>VACUUM</command> operations
+ of varying frequency and scope, as explained in the following subsections.
+ </para>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
- </para>
+ <para>
+ There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
+ and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
+ disk space but runs much more slowly. Also,
+ the standard form of <command>VACUUM</command> can run in parallel with production
+ database operations. (Commands such as <command>SELECT</command>,
+ <command>INSERT</command>, <command>UPDATE</command>, and
+ <command>DELETE</command> will continue to function normally, though you
+ will not be able to modify the definition of a table with commands such as
+ <command>ALTER TABLE</command> while it is being vacuumed.)
+ <command>VACUUM FULL</command> requires an
+ <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
+ working on, and therefore cannot be done in parallel with other use
+ of the table. Generally, therefore,
+ administrators should strive to use standard <command>VACUUM</command> and
+ avoid <command>VACUUM FULL</command>.
+ </para>
- <para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
- </para>
- </sect2>
+ <para>
+ <command>VACUUM</command> creates a substantial amount of I/O
+ traffic, which can cause poor performance for other active sessions.
+ There are configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming — see
+ <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.0
v2-0008-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v2-0008-Overhaul-freezing-and-wraparound-docs.patchDownload
From 8f497df592b6d0123c4a17e0e5ccb9cbd403cb0c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 13:04:13 -0700
Subject: [PATCH v2 8/9] Overhaul freezing and wraparound docs.
This is almost a complete rewrite. "Preventing Transaction ID
Wraparound Failures" becomes "Freezing to manage the transaction ID
space". This is follow-up work to commit 1de58df4, which added
page-level freezing to VACUUM.
The emphasis is now on the physical work of freezing pages. This flows
a little better than it otherwise would due to recent structural
cleanups to maintenance.sgml; discussion about freezing now immediately
follows discussion of cleanup of dead tuples. We still talk about the
problem of the system activating xidStopLimit protections in the same
section, but we use much less alarmist language about data corruption,
and are no longer overly concerned about the very worst case. We don't
rescind the recommendation that users recover from an xidStopLimit
outage by using single user mode, though that seems like something we
should aim to do in the near future.
There is no longer a separate sect3 to discuss MultiXactId related
issues. VACUUM now performs exactly the same processing steps when it
freezes a page, independent of the trigger condition.
Also describe the page-level freezing FPI optimization added by commit
1de58df4. This is expected to trigger the majority of all freezing with
many types of workloads.
---
doc/src/sgml/config.sgml | 20 +-
doc/src/sgml/logicaldecoding.sgml | 2 +-
doc/src/sgml/maintenance.sgml | 738 ++++++++++++++--------
doc/src/sgml/ref/create_table.sgml | 2 +-
doc/src/sgml/ref/prepare_transaction.sgml | 2 +-
doc/src/sgml/ref/vacuum.sgml | 6 +-
doc/src/sgml/ref/vacuumdb.sgml | 4 +-
doc/src/sgml/xact.sgml | 4 +-
8 files changed, 514 insertions(+), 264 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b56f073a9..fa825b5f1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8359,7 +8359,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Note that even when this parameter is disabled, the system
will launch autovacuum processes if necessary to
prevent transaction ID wraparound. See <xref
- linkend="vacuum-for-wraparound"/> for more information.
+ linkend="freezing-xid-space"/> for more information.
</para>
</listitem>
</varlistentry>
@@ -8548,7 +8548,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
This parameter can only be set at server start, but the setting
can be reduced for individual tables by
changing table storage parameters.
- For more information see <xref linkend="vacuum-for-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -8577,7 +8577,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
400 million multixacts.
This parameter can only be set at server start, but the setting can
be reduced for individual tables by changing table storage parameters.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="aggressive-strategy"/>.
</para>
</listitem>
</varlistentry>
@@ -9284,7 +9284,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound autovacuum is launched for the table. For more
information see
- <xref linkend="vacuum-for-wraparound"/>.
+ <xref linkend="aggressive-strategy"/>.
</para>
</listitem>
</varlistentry>
@@ -9306,7 +9306,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-freeze-max-age"/>, so
that there is not an unreasonably short time between forced
autovacuums. For more information see <xref
- linkend="vacuum-for-wraparound"/>.
+ linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9343,7 +9343,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
set this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-freeze-max-age"/>.
+ linkend="guc-autovacuum-freeze-max-age"/>. For more
+ information see <xref linkend="xid-stop-limit"/>.
</para>
</listitem>
</varlistentry>
@@ -9367,7 +9368,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<xref linkend="guc-autovacuum-multixact-freeze-max-age"/>, so that a
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound is launched for the table.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="aggressive-strategy"/>.
</para>
</listitem>
</varlistentry>
@@ -9388,7 +9389,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>,
so that there is not an unreasonably short time between forced
autovacuums.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9421,7 +9422,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. For more
+ information see <xref linkend="xid-stop-limit"/>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cbd3aa804..80dade3be 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -353,7 +353,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
because neither required WAL nor required rows from the system catalogs
can be removed by <command>VACUUM</command> as long as they are required by a replication
slot. In extreme cases this could cause the database to shut down to prevent
- transaction ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ transaction ID wraparound (see <xref linkend="freezing-xid-space"/>).
So if a slot is no longer required it should be dropped.
</para>
</caution>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 7476e5922..675f6945d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -275,15 +275,21 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To maintain the system's ability to allocated new
+ transaction IDs (and new multixact IDs) through freezing.</simpara>
</listitem>
<listitem>
<simpara>To update the visibility map, which speeds
up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
+ scans</link>, and helps the next <command>VACUUM</command>
+ operation avoid needlessly scanning pages that are already
+ frozen</simpara>
+ </listitem>
+
+ <listitem>
+ <simpara>To truncate obsolescent transaction status information,
+ when possible</simpara>
</listitem>
<listitem>
@@ -432,303 +438,491 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-wraparound">
- <title>Preventing Transaction ID Wraparound Failures</title>
-
- <indexterm zone="vacuum-for-wraparound">
- <primary>transaction ID</primary>
- <secondary>wraparound</secondary>
- </indexterm>
+ <sect2 id="freezing-xid-space">
+ <title>Freezing to manage the transaction ID space</title>
<indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
+ <primary>Freezing</primary>
+ <secondary>of transaction IDs and multixact IDs</secondary>
</indexterm>
<para>
- <productname>PostgreSQL</productname>'s <link
- linkend="mvcc-intro">MVCC</link> transaction semantics depend on
- being able to compare <glossterm linkend="glossary-xid">transaction
- ID numbers (<acronym>XID</acronym>)</glossterm> to determine
- whether or not the row is visible to each query's MVCC snapshot
- (see <link linkend="interpreting-xid-stamps">
- interpreting XID stamps from tuple headers</link>). But since
- on-disk storage of transaction IDs in heap pages uses a truncated
- 32-bit representation to save space (rather than the full 64-bit
- representation), it is necessary to vacuum every table in every
- database <emphasis>at least</emphasis> once every two billion
- transactions (though far more frequent vacuuming is typical).
+ <command>VACUUM</command> often marks certain pages
+ <emphasis>frozen</emphasis>, indicating that all eligible rows on
+ the page were inserted by a transaction that committed
+ sufficiently far in the past that the effects of the inserting
+ transaction are certain to be visible to all current and future
+ transactions. The specific <glossterm
+ linkend="glossary-xid">transaction ID number
+ (<acronym>XID</acronym>)</glossterm> stored in a frozen heap
+ row's <structfield>xmin</structfield> field is no longer needed to
+ determine anything about the row's visibility.
+ Furthermore, when a row undergoing freezing happens to have an XID
+ set in its <structfield>xmax</structfield> field (possibly an XID
+ left behind by an earlier <command>SELECT FOR UPDATE</command> row
+ locker), the <structfield>xmax</structfield> field's XID is
+ typically also removed (actually, <structfield>xmax</structfield>
+ is set to the special XID value <literal>0</literal>, also known
+ as <literal>InvalidTransactionId</literal>). See <xref
+ linkend="interpreting-xid-stamps"/> for further background
+ information.
</para>
<para>
- <xref linkend="guc-vacuum-freeze-min-age"/>
- controls how old an XID value has to be before rows bearing that XID will be
- frozen. Increasing this setting may avoid unnecessary work if the
- rows that would otherwise be frozen will soon be modified again,
- but decreasing this setting increases
- the number of transactions that can elapse before the table must be
- vacuumed again.
+ Once frozen, heap pages are <quote>self-contained</quote>. All of
+ the page's rows can be read by every transaction, without any
+ transaction ever needing to consult externally stored transaction
+ status metadata (most notably, transaction commit/abort status
+ information from <filename>pg_xact</filename> won't ever be
+ required).
</para>
<para>
- <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link>
- to determine which pages of a table must be scanned. Normally, it
- will skip pages that don't have any dead row versions even if those pages
- might still have row versions with old XID values. Therefore, normal
- <command>VACUUM</command>s won't always freeze every old row version in the table.
- When that happens, <command>VACUUM</command> will eventually need to perform an
- <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen
- XID and MXID values, including those from all-visible but not all-frozen pages.
- In practice most tables require periodic aggressive vacuuming.
- <xref linkend="guc-vacuum-freeze-table-age"/>
- controls when <command>VACUUM</command> does that: all-visible but not all-frozen
- pages are scanned if the number of transactions that have passed since the
- last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus
- <varname>vacuum_freeze_min_age</varname>. Setting
- <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to
- always use its aggressive strategy.
+ It can be useful for <command>VACUUM</command> to put off some of
+ the work of freezing, but freezing cannot be put off indefinitely.
+ Since on-disk storage of transaction IDs in heap row headers uses
+ a truncated 32-bit representation to save space (rather than the
+ full 64-bit representation), freezing plays a crucial role in
+ enabling <link linkend="aggressive-strategy">management of the XID
+ address space</link> by <command>VACUUM</command>. If freezing
+ by <command>VACUUM</command> is somehow impeded (in a database
+ that continues to allocate new transaction IDs), the system will
+ eventually <link linkend="xid-stop-limit">refuse to allocate new
+ transaction IDs</link>. This generally only happens in extreme
+ cases where the system has been misconfigured.
</para>
<para>
- The maximum time that a table can go unvacuumed is two billion
- transactions minus the <varname>vacuum_freeze_min_age</varname> value at
- the time of the last aggressive vacuum. If it were to go
- unvacuumed for longer than
- that, data loss could result. To ensure that this does not happen,
- autovacuum is invoked on any table that might contain unfrozen rows with
- XIDs older than the age specified by the configuration parameter <xref
- linkend="guc-autovacuum-freeze-max-age"/>. (This will happen even if
- autovacuum is disabled.)
+ <xref linkend="guc-vacuum-freeze-min-age"/> can be used to control
+ when freezing takes place. When <command>VACUUM</command> scans a
+ heap page containing even one XID that already attained an age
+ exceeding this value, the page is frozen.
+ </para>
+
+ <indexterm>
+ <primary>MultiXactId</primary>
+ <secondary>Freezing of</secondary>
+ </indexterm>
+
+ <para>
+ <firstterm>Multixact IDs</firstterm> are used to support row
+ locking by multiple transactions. Since there is only limited
+ space in a tuple header to store lock information, that
+ information is encoded as a <quote>multiple transaction
+ ID</quote>, or multixact ID for short, whenever there is more
+ than one transaction concurrently locking a row. Information
+ about which transaction IDs are included in any particular
+ multixact ID is stored separately, and only the multixact ID
+ appears in the <structfield>xmax</structfield> field in the tuple
+ header. Like transaction IDs, multixact IDs are implemented as a
+ 32-bit counter and corresponding storage. Since MultiXact IDs are
+ stored in the <structfield>xmax</structfield> field of heap rows
+ (and have an analogous dependency on external transaction status
+ information), they may also need to be removed during freezing.
</para>
<para>
- This implies that if a table is not otherwise vacuumed,
- autovacuum will be invoked on it approximately once every
- <varname>autovacuum_freeze_max_age</varname> minus
- <varname>vacuum_freeze_min_age</varname> transactions.
- For tables that are regularly vacuumed for space reclamation purposes,
- this is of little importance. However, for static tables
- (including tables that receive inserts, but no updates or deletes),
- there is no need to vacuum for space reclamation, so it can
- be useful to try to maximize the interval between forced autovacuums
- on very large static tables. Obviously one can do this either by
- increasing <varname>autovacuum_freeze_max_age</varname> or decreasing
- <varname>vacuum_freeze_min_age</varname>.
+ <xref linkend="guc-vacuum-multixact-freeze-min-age"/> also
+ controls when freezing takes place. It is analogous to
+ <varname>vacuum_freeze_min_age</varname>, but <quote>age</quote>
+ is expressed in units of Multixact ID (not in units of XID).
+ <varname>vacuum_multixact_freeze_min_age</varname> typically has
+ only a minimal impact on how many pages are frozen, partly because
+ <command>VACUUM</command> usually prefers to remove MultiXact IDs
+ proactively based on low-level considerations around the cost of
+ freezing. <varname>vacuum_multixact_freeze_min_age</varname>
+ <emphasis>forces</emphasis> <command>VACUUM</command> to process
+ MultiXact IDs in certain rare cases where the implementation would
+ <emphasis>not</emphasis> ordinarily do so.
</para>
<para>
- The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 *
- <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be
- capped to the maximum. A value higher than
- <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an
- anti-wraparound autovacuum would be triggered at that point anyway, and
- the 0.95 multiplier leaves some breathing room to run a manual
- <command>VACUUM</command> before that happens. As a rule of thumb,
- <command>vacuum_freeze_table_age</command> should be set to a value somewhat
- below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that
- a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by
- normal delete and update activity is run in that window. Setting it too
- close could lead to anti-wraparound autovacuums, even though the table
- was recently vacuumed to reclaim space, whereas lower values lead to more
- frequent aggressive vacuuming.
+ Managing the added <acronym>WAL</acronym> volume from freezing
+ over time is an important consideration for
+ <command>VACUUM</command>. This is why <command>VACUUM</command>
+ doesn't just freeze every eligible tuple at the earliest
+ opportunity: the <acronym>WAL</acronym> written to freeze a page's
+ tuples <quote>goes to waste</quote> in cases where the resulting
+ frozen tuples are soon deleted or updated anyway. It's also why
+ <command>VACUUM</command> <emphasis>will</emphasis> freeze all
+ eligible tuples from a heap page once the decision to freeze at
+ least one tuple is taken: at that point the added cost to freeze
+ all eligible tuples eagerly (measured in <quote>extra bytes of
+ <acronym>WAL</acronym> written</quote>) is far lower than the
+ probable cost of deferring freezing until a future
+ <command>VACUUM</command> operation against the same table.
+ Furthermore, once the page is frozen it can be marked all-frozen
+ in the visibility map right away.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 16,
+ freezing was triggered at the level of individual
+ <structfield>xmin</structfield> and
+ <structfield>xmax</structfield> fields. Freezing only affected
+ the exact XIDs that had already attained an age at or exceeding
+ <varname>vacuum_freeze_min_age</varname>, regardless of costs.
+ </para>
+ </note>
+
+ <para>
+ <command>VACUUM</command> also triggers freezing of pages in cases
+ where it already proved necessary to write out an
+ <acronym>FPI</acronym> (full page image) alongside a
+ <acronym>WAL</acronym> record generated while removing dead tuples
+ (see <xref linkend="wal-reliability"/> for background information
+ about how <acronym>FPI</acronym>s provide torn page protection).
+ This <quote>freeze on an <acronym>FPI</acronym> write</quote>
+ mechanism is designed to lower the absolute volume of
+ <acronym>WAL</acronym> written over time by
+ <command>VACUUM</command>, across multiple
+ <command>VACUUM</command> operations against the same table. The
+ mechanism often prevents future <command>VACUUM</command>
+ operations from having to write a second <acronym>FPI</acronym>
+ for the same page much later on. In effect,
+ <command>VACUUM</command> writes slightly more
+ <acronym>WAL</acronym> in the short term with the aim of
+ ultimately needing to write much less <acronym>WAL</acronym> in
+ the long term.
</para>
<para>
- The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname>
- (and <varname>vacuum_freeze_table_age</varname> along with it) is that
- the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename>
- subdirectories of the database cluster will take more space, because it
- must store the commit status and (if <varname>track_commit_timestamp</varname> is
- enabled) timestamp of all transactions back to
- the <varname>autovacuum_freeze_max_age</varname> horizon. The commit status uses
- two bits per transaction, so if
- <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value
- of two billion, <filename>pg_xact</filename> can be expected to grow to about half
- a gigabyte and <filename>pg_commit_ts</filename> to about 20GB. If this
- is trivial compared to your total database size,
- setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value
- is recommended. Otherwise, set it depending on what you are willing to
- allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage.
- (The default, 200 million transactions, translates to about 50MB
- of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename>
- storage.)
+ <command>VACUUM</command> may not be able to freeze every tuple's
+ <structfield>xmin</structfield> in relatively rare cases. The
+ criteria that determines basic eligibility for freezing is exactly
+ the same as the one that determines if a deleted tuple should be
+ considered <literal>removable</literal> or merely <literal>dead
+ but not yet removable</literal> (namely, the XID-based
+ <literal>removable cutoff</literal>). In extreme cases a
+ long-running transaction can hold back every
+ <command>VACUUM</command>'s removable cutoff for so long that the
+ system is eventually forced to activate <link
+ linkend="xid-stop-limit"><literal>xidStopLimit</literal> mode
+ protections</link>.
</para>
- <para>
- One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that
- it might cause <command>VACUUM</command> to do useless work: freezing a row
- version is a waste of time if the row is modified
- soon thereafter (causing it to acquire a new XID). So the setting should
- be large enough that rows are not frozen until they are unlikely to change
- any more.
- </para>
+ <sect3 id="aggressive-strategy">
+ <title><command>VACUUM</command>'s aggressive strategy</title>
- <para>
- To track the age of the oldest unfrozen XIDs in a database,
- <command>VACUUM</command> stores XID
- statistics in the system tables <structname>pg_class</structname> and
- <structname>pg_database</structname>. In particular,
- the <structfield>relfrozenxid</structfield> column of a table's
- <structname>pg_class</structname> row contains the oldest remaining unfrozen
- XID at the end of the most recent <command>VACUUM</command> that successfully
- advanced <structfield>relfrozenxid</structfield> (typically the most recent
- aggressive VACUUM). Similarly, the
- <structfield>datfrozenxid</structfield> column of a database's
- <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
- appearing in that database — it is just the minimum of the
- per-table <structfield>relfrozenxid</structfield> values within the database.
- A convenient way to
- examine this information is to execute queries such as:
+ <indexterm zone="aggressive-strategy">
+ <primary>transaction ID</primary>
+ <secondary>wraparound</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs and multixact IDs</secondary>
+ </indexterm>
+
+ <para>
+ As already noted briefly in the introductory section, freezing
+ doesn't just allow queries to avoid lookups of subsidiary
+ transaction status information in structures such as
+ <filename>pg_xact</filename>. Freezing also plays a crucial role
+ in enabling management of the XID address space by
+ <command>VACUUM</command>.
+ </para>
+
+ <para>
+ <command>VACUUM</command> maintains information about the oldest
+ unfrozen XID that remains in the table when it uses its
+ <firstterm>aggressive strategy</firstterm>. This information is
+ stored in the <structname>pg_class</structname> system table at
+ the end of each aggressive <command>VACUUM</command>: the table
+ processed by aggressive <command>VACUUM</command> has its
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
+ updated (<structfield>relfrozenxid</structfield>
+ <quote>advances</quote> by a certain number of XIDs). Similarly,
+ the <structfield>datfrozenxid</structfield> column of a
+ database's <structname>pg_database</structname> row is a lower
+ bound on the unfrozen XIDs appearing in that database — it
+ is just the minimum of the per-table
+ <structfield>relfrozenxid</structfield> values within the
+ database. The system also maintains
+ <structname>pg_class</structname>.<structfield>relminmxid</structfield> and
+ <structname>pg_database</structname>.<structfield>datminmxid</structfield>
+ fields to track the oldest MultiXact ID, while following
+ analogous rules.
+ </para>
+
+ <tip>
+ <para>
+ When the <command>VACUUM</command> command's
+ <literal>VERBOSE</literal> parameter is specified,
+ <command>VACUUM</command> prints various statistics about the
+ table. This includes information about how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advanced, and the number
+ of newly frozen pages. The same details appear in the server
+ log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum.
+ </para>
+ </tip>
+
+ <para>
+ This process is intended to reliably prevent the entire database
+ from ever having a transaction ID that is excessively far in the
+ past. The maximum <quote>distance</quote> that the system can
+ tolerate between the oldest unfrozen transaction ID and the next
+ (unallocated) transaction ID is about 2.1 billion transaction
+ IDs. That is an upper limit; the greatest
+ <literal>age(relfrozenxid)</literal>/<literal>age(datfrozenxid)</literal>
+ in the system should ideally never exceed a fraction of that
+ upper limit. If that upper limit is ever reached, then the
+ system will activate <link
+ linkend="xid-stop-limit"><literal>xidStopLimit</literal> mode
+ protections</link>. These protections will remain in force
+ until <command>VACUUM</command> (typically autovacuum) manages to
+ advance the oldest <literal>datfrozenxid</literal> in the cluster
+ (by advancing that database's oldest relfrozenxid via an
+ aggressive <command>VACUUM</command>).
+ </para>
+
+ <para>
+ The 2.1 billion XIDs <quote>distance</quote> invariant is a
+ consequence of the fact that on-disk storage of transaction IDs
+ in heap row headers uses a truncated 32-bit representation to
+ save space (rather than the full 64-bit representation). Since
+ all unfrozen transaction IDs from heap tuple headers
+ <emphasis>must</emphasis> be from the same transaction ID epoch
+ (which is what the invariant actually assures), there isn't any
+ need to store a separate epoch field in each tuple header. The
+ downside is that the system depends on freezing (and
+ <structfield>relfrozenxid</structfield> advancement during
+ aggressive <command>VACUUM</command>s) to make sure that the
+ <quote>available supply</quote> of transaction IDs never exceeds
+ the <quote>demand</quote>.
+ </para>
+
+ <note>
+ <para>
+ In practice most tables require periodic aggressive vacuuming.
+ However, some individual non-aggressive
+ <command>VACUUM</command> operations may be able to advance
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield>. This is most common in
+ small, frequently modified tables, where
+ <command>VACUUM</command> happens to scan all pages (or at least
+ all pages not marked all-frozen in the visibility map) in the
+ course of removing dead tuples.
+ </para>
+ </note>
+
+ <para>
+ <command>VACUUM</command>/autovacuum also use <xref
+ linkend="guc-vacuum-multixact-freeze-table-age"/> and <xref
+ linkend="guc-autovacuum-multixact-freeze-max-age"/> settings as
+ independent Multixact ID orientated controls for aggressive mode
+ <command>VACUUM</command> and anti-wraparound autovacuum.
+ These work analogously to the XID-based
+ <varname>vacuum_freeze_table_age</varname> and
+ <varname>autovacuum_freeze_max_age</varname>, respectively.
+ Note, however, that if the <link
+ linkend="vacuum-truncate-pg-xact">multixacts members storage
+ area</link> exceeds 2GB, then the effective value of
+ <varname>autovacuum_multixact_freeze_max_age</varname> will be
+ lower, resulting in more frequent aggressive mode VACUUMs.
+ </para>
+
+ <para>
+ There is only one major runtime behavioral differences between
+ aggressive mode <command>VACUUM</command> and non-aggressive
+ (standard) <command>VACUUM</command>. Both kinds of
+ <command>VACUUM</command> use the <link
+ linkend="storage-vm">visibility map</link> to determine which
+ pages of a table must be scanned, and which can be skipped.
+ However, only non-aggressive <command>VACUUM</command> will skip
+ pages that don't have any dead row versions even if those pages
+ might still have row versions with old XID values; aggressive
+ <command>VACUUM</command>s are limited to skipping pages already
+ marked all-frozen (and all-visible).
+ </para>
+
+ <para>
+ As a consequence of all this, non-aggressive
+ <command>VACUUM</command>s usually won't freeze
+ <emphasis>every</emphasis> page with an old row version in the
+ table. Most individual tables will eventually need an aggressive
+ <command>VACUUM</command>, which will reliably freeze all pages
+ with XID and MXID values older than
+ <varname>vacuum_freeze_min_age</varname>, including those from
+ all-visible but not all-frozen pages (and then update
+ <structname>pg_class</structname>). <xref
+ linkend="guc-vacuum-freeze-table-age"/> controls when
+ <command>VACUUM</command> must use its aggressive strategy.
+ Since the setting is applied against
+ <literal>age(relfrozenxid)</literal>, settings like
+ <varname>vacuum_freeze_min_age</varname> may influence the exact
+ cadence of aggressive vacuuming. Setting
+ <varname>vacuum_freeze_table_age</varname> to 0 forces
+ <command>VACUUM</command> to always use its aggressive strategy.
+ </para>
+
+ <note>
+ <para>
+ Aggressive <command>VACUUM</command>s apply the same rules for
+ freezing as non-aggressive <command>VACUUM</command>s. You may
+ nevertheless notice that aggressive <command>VACUUM</command>s
+ perform a disproportionately large amount of the total required
+ freezing in larger tables.
+ </para>
+ <para>
+ This is an indirect consequence of the fact that non-aggressive
+ <command>VACUUM</command>s won't scan pages that are marked
+ all-visible but not also marked all-frozen in the visibility
+ map. <command>VACUUM</command> can only consider freezing those
+ pages that it actually gets to scan.
+ </para>
+ <para>
+ Note in particular that <varname>vacuum_freeze_min_age</varname>
+ isn't very likely to trigger freezing in non-aggressive
+ <command>VACUUM</command>s, at least with default settings. The
+ <quote>freeze on an <acronym>FPI</acronym> write</quote>
+ mechanism is somewhat more likely to trigger in non-aggressive
+ <command>VACUUM</command>s in practice, though. Much depends on
+ workload characteristics.
+ </para>
+ </note>
+
+ <para>
+ To ensure that every table has its
+ <structfield>relfrozenxid</structfield> advanced at somewhat
+ regular intervals, including totally static tables, autovacuum is
+ invoked on any table that might contain unfrozen rows with XIDs
+ older than the age specified by the configuration parameter <xref
+ linkend="guc-autovacuum-freeze-max-age"/>. This will happen
+ even if autovacuum is disabled.
+ </para>
+
+ <para>
+ <!-- This isn't strictly true, since anti-wraparound autovacuuming
+ merely implies aggressive mode -->
+ In practice all anti-wraparound autovacuums will use
+ <command>VACUUM</command>'s aggressive strategy. This is assured
+ because the effective value of
+ <varname>vacuum_freeze_table_age</varname> is
+ <quote>clamped</quote> to a value no greater than 95% of the
+ current value of <varname>autovacuum_freeze_max_age</varname>.
+ As a rule of thumb, <command>vacuum_freeze_table_age</command>
+ should be set to a value somewhat below
+ <varname>autovacuum_freeze_max_age</varname>, leaving enough gap
+ so that a regularly scheduled <command>VACUUM</command> or an
+ autovacuum triggered by inserts, updates and deletes is run in
+ that window. Anti-wraparound autovacuums can be avoided
+ altogether in tables that reliably receive
+ <emphasis>some</emphasis> <command>VACUUM</command>s that use the
+ aggressive strategy.
+ </para>
+
+ <para>
+ A convenient way to examine information about
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> is to execute queries such as:
<programlisting>
SELECT c.oid::regclass as table_name,
- greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
+greatest(age(c.relfrozenxid),
+ age(t.relfrozenxid)) as xid_age,
+mxid_age(c.relminmxid)
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');
-SELECT datname, age(datfrozenxid) FROM pg_database;
+SELECT datname,
+age(datfrozenxid) as xid_age,
+mxid_age(datminmxid)
+FROM pg_database;
</programlisting>
- The <literal>age</literal> column measures the number of transactions from the
- cutoff XID to the current transaction's XID.
- </para>
-
- <tip>
- <para>
- When the <command>VACUUM</command> command's <literal>VERBOSE</literal>
- parameter is specified, <command>VACUUM</command> prints various
- statistics about the table. This includes information about how
- <structfield>relfrozenxid</structfield> and
- <structfield>relminmxid</structfield> advanced, and the number of
- newly frozen pages. The same details appear in the server log when
- autovacuum logging (controlled by <xref
- linkend="guc-log-autovacuum-min-duration"/>) reports on a
- <command>VACUUM</command> operation executed by autovacuum.
+ The <literal>age</literal> column measures the number of
+ transactions from the cutoff XID to the next unallocated
+ transactions ID. The <literal>mxid_age</literal> column
+ measures the number of MultiXactIds from the cutoff MultiXactId
+ to the next unallocated multixact ID.
</para>
- </tip>
+ </sect3>
- <para>
- <command>VACUUM</command> normally only scans pages that have been modified
- since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
- advanced when every page of the table
- that might contain unfrozen XIDs is scanned. This happens when
- <structfield>relfrozenxid</structfield> is more than
- <varname>vacuum_freeze_table_age</varname> transactions old, when
- <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all
- pages that are not already all-frozen happen to
- require vacuuming to remove dead row versions. When <command>VACUUM</command>
- scans every page in the table that is not already all-frozen, it should
- set <literal>age(relfrozenxid)</literal> to a value just a little more than the
- <varname>vacuum_freeze_min_age</varname> setting
- that was used (more by the number of transactions started since the
- <command>VACUUM</command> started). <command>VACUUM</command>
- will set <structfield>relfrozenxid</structfield> to the oldest XID
- that remains in the table, so it's possible that the final value
- will be much more recent than strictly required.
- If no <structfield>relfrozenxid</structfield>-advancing
- <command>VACUUM</command> is issued on the table until
- <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
- be forced for the table.
- </para>
-
- <para>
- If for some reason autovacuum fails to clear old XIDs from a table, the
- system will begin to emit warning messages like this when the database's
- oldest XIDs reach forty million transactions from the wraparound point:
+ <sect3 id="xid-stop-limit">
+ <title><literal>xidStopLimit</literal> mode</title>
+ <para>
+ If for some reason autovacuum utterly fails to advance any
+ table's <structfield>relfrozenxid</structfield> or
+ <structfield>relminmxid</structfield> for an extended period, and
+ if XIDs and/or MultiXactIds continue to be allocated, the system
+ will begin to emit warning messages like this when the database's
+ oldest XIDs reach forty million transactions from the wraparound
+ point:
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
</programlisting>
- (A manual <command>VACUUM</command> should fix the problem, as suggested by the
- hint; but note that the <command>VACUUM</command> must be performed by a
- superuser, else it will fail to process system catalogs and thus not
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will eventually refuse
+ to start any new transactions. This happens at the point that
+ there are fewer than three million transactions left:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
- </para>
-
- <sect3 id="vacuum-for-multixact-wraparound">
- <title>Multixacts and Wraparound</title>
-
- <indexterm>
- <primary>MultiXactId</primary>
- </indexterm>
-
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of multixact IDs</secondary>
- </indexterm>
-
- <para>
- <firstterm>Multixact IDs</firstterm> are used to support row locking by
- multiple transactions. Since there is only limited space in a tuple
- header to store lock information, that information is encoded as
- a <quote>multiple transaction ID</quote>, or multixact ID for short,
- whenever there is more than one transaction concurrently locking a
- row. Information about which transaction IDs are included in any
- particular multixact ID is stored separately in
- the <filename>pg_multixact</filename> subdirectory, and only the multixact ID
- appears in the <structfield>xmax</structfield> field in the tuple header.
- Like transaction IDs, multixact IDs are implemented as a
- 32-bit counter and corresponding storage, all of which requires
- careful aging management, storage cleanup, and wraparound handling.
- There is a separate storage area which holds the list of members in
- each multixact, which also uses a 32-bit counter and which must also
- be managed.
+ The three-million-transaction safety margin exists to let the
+ administrator recover without data loss, by manually executing the
+ required <command>VACUUM</command> commands. However, since the system will not
+ execute commands once it has gone into the safety shutdown mode,
+ the only way to do this is to stop the server and start the server in single-user
+ mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
+ in single-user mode. See the <xref linkend="app-postgres"/> reference
+ page for details about using single-user mode.
</para>
<para>
- Whenever <command>VACUUM</command> scans any part of a table, it will replace
- any multixact ID it encounters which is older than
- <xref linkend="guc-vacuum-multixact-freeze-min-age"/>
- by a different value, which can be the zero value, a single
- transaction ID, or a newer multixact ID. For each table,
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest
- possible multixact ID still appearing in any tuple of that table.
- If this value is older than
- <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive
- vacuum is forced. As discussed in the previous section, an aggressive
- vacuum means that only those pages which are known to be all-frozen will
- be skipped. <function>mxid_age()</function> can be used on
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age.
+ Anything that influences when and how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advance will also directly
+ affect the high watermark storage overhead from storing a great
+ deal of historical transaction status information. The
+ additional <link linkend="vacuum-truncate-pg-xact">space
+ overhead</link> is usually of fairly minimal concern. It is
+ noted as an additional downside of allowing the system to get
+ close to <literal>xidStopLimit</literal> for the sake of
+ completeness.
</para>
- <para>
- Aggressive <command>VACUUM</command>s, regardless of what causes
- them, are <emphasis>guaranteed</emphasis> to be able to advance
- the table's <structfield>relminmxid</structfield>.
- Eventually, as all tables in all databases are scanned and their
- oldest multixact values are advanced, on-disk storage for older
- multixacts can be removed.
- </para>
+ <note>
+ <title>Historical Note</title>
+ <para>
+ The term <quote>wraparound</quote> is inaccurate. Also, there
+ is no <quote>data loss</quote> here — the message is
+ simply wrong.
+ </para>
+ <para>
+ XXX: We really need to fix the situation with single user mode
+ to put things on a good footing.
+ </para>
+ </note>
<para>
- As a safety device, an aggressive vacuum scan will
- occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds 2GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled.
+ In emergencies, <command>VACUUM</command> will take extraordinary
+ measures to avoid <literal>xidStopLimit</literal> mode. A
+ failsafe mechanism is triggered when thresholds controlled by
+ <xref linkend="guc-vacuum-failsafe-age"/> and <xref
+ linkend="guc-vacuum-multixact-failsafe-age"/> are reached. The
+ failsafe prioritizes advancing
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> as quickly as possible.
+ Once the failsafe triggers, <command>VACUUM</command> bypasses
+ all remaining non-essential maintenance tasks, and stops applying
+ any cost-based delay that was in effect. Any <glossterm
+ linkend="glossary-buffer-access-strategy">Buffer Access
+ Strategy</glossterm> in use will also be disabled.
</para>
</sect3>
</sect2>
@@ -766,6 +960,58 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect2>
+ <sect2 id="vacuum-truncate-pg-xact">
+ <title>Truncating transaction status information</title>
+ <para>
+ As noted in <xref linkend="xid-stop-limit"/>, anything that
+ influences when and how <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advance will also directly
+ affect the high watermark storage overhead needed to store
+ historical transaction status information. For example,
+ increasing <varname>autovacuum_freeze_max_age</varname> (and
+ <varname>vacuum_freeze_table_age</varname> along with it) will
+ make the <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename> subdirectories of the database
+ cluster take more space, because they store the commit status and
+ (if <varname>track_commit_timestamp</varname> is enabled)
+ timestamp of all transactions back to the
+ <varname>datfrozenxid</varname> horizon (the earliest
+ <varname>datfrozenxid</varname> in the entire cluster).
+ </para>
+ <para>
+ The commit status uses two bits per transaction. The default
+ <varname>autovacuum_freeze_max_age</varname> setting of 200
+ million transactions translates to about 50MB of
+ <filename>pg_xact</filename> storage. When
+ <varname>track_commit_timestamp</varname> is enabled, about 2GB of
+ <filename>pg_commit_ts</filename> storage will also be required.
+ </para>
+ <para>
+ MultiXactId status information is implemented as two separate
+ <acronym>SLRU</acronym> storage areas:
+ <filename>pg_multixact/members</filename>, and
+ <filename>pg_multixact/offsets</filename>. There is no simple
+ formula to determine the storage overhead per MultiXactId, since
+ MultiXactIds have a variable number of member XIDs.
+ </para>
+ <para>
+ Truncating of transaction status information is only possible at
+ the end of <command>VACUUM</command>s that advance
+ <structfield>relfrozenxid</structfield> (in the case of
+ <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename>) or
+ <structfield>relminmxid</structfield> (in the case of
+ (<filename>pg_multixact/members</filename> and
+ <filename>pg_multixact/offsets</filename>) of whatever table
+ happened to have the oldest value in the cluster when the
+ <command>VACUUM</command> began. This typically happens very
+ infrequently, often during <link
+ linkend="aggressive-strategy">aggressive strategy</link>
+ <command>VACUUM</command>s of one of the database's largest
+ tables.
+ </para>
+ </sect2>
+
<sect2 id="vacuum-for-statistics">
<title>Updating Planner Statistics</title>
@@ -881,7 +1127,7 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</tip>
</sect2>
-</sect1>
+ </sect1>
<sect1 id="routine-reindex">
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fa..8aa332fcf 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1515,7 +1515,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
and/or <command>ANALYZE</command> operations on this table following the rules
discussed in <xref linkend="autovacuum"/>.
If false, this table will not be autovacuumed, except to prevent
- transaction ID wraparound. See <xref linkend="vacuum-for-wraparound"/> for
+ transaction ID wraparound. See <xref linkend="freezing-xid-space"/> for
more about wraparound prevention.
Note that the autovacuum daemon does not run at all (except to prevent
transaction ID wraparound) if the <xref linkend="guc-autovacuum"/>
diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac..ede50d6f7 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -128,7 +128,7 @@ PREPARE TRANSACTION <replaceable class="parameter">transaction_id</replaceable>
This will interfere with the ability of <command>VACUUM</command> to reclaim
storage, and in extreme cases could cause the database to shut down
to prevent transaction ID wraparound (see <xref
- linkend="vacuum-for-wraparound"/>). Keep in mind also that the transaction
+ linkend="freezing-xid-space"/>). Keep in mind also that the transaction
continues to hold whatever locks it held. The intended usage of the
feature is that a prepared transaction will normally be committed or
rolled back as soon as an external transaction manager has verified that
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 57bc4c23e..0c28604a6 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -123,7 +123,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<term><literal>FREEZE</literal></term>
<listitem>
<para>
- Selects aggressive <quote>freezing</quote> of tuples.
+ Makes <quote>freezing</quote> <emphasis>maximally</emphasis>
+ aggressive, and forces <command>VACUUM</command> to use its
+ <link linkend="aggressive-strategy">aggressive strategy</link>.
Specifying <literal>FREEZE</literal> is equivalent to performing
<command>VACUUM</command> with the
<xref linkend="guc-vacuum-freeze-min-age"/> and
@@ -219,7 +221,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
there are many dead tuples in the table. This may be useful
when it is necessary to make <command>VACUUM</command> run as
quickly as possible to avoid imminent transaction ID wraparound
- (see <xref linkend="vacuum-for-wraparound"/>). However, the
+ (see <xref linkend="freezing-xid-space"/>). However, the
wraparound failsafe mechanism controlled by <xref
linkend="guc-vacuum-failsafe-age"/> will generally trigger
automatically to avoid transaction ID wraparound failure, and
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index da2393783..b61d523c2 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -233,7 +233,7 @@ PostgreSQL documentation
ID age of at least <replaceable class="parameter">mxid_age</replaceable>.
This setting is useful for prioritizing tables to process to prevent
multixact ID wraparound (see
- <xref linkend="vacuum-for-multixact-wraparound"/>).
+ <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the multixact ID age of a relation is
@@ -254,7 +254,7 @@ PostgreSQL documentation
transaction ID age of at least
<replaceable class="parameter">xid_age</replaceable>. This setting
is useful for prioritizing tables to process to prevent transaction
- ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ ID wraparound (see <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the transaction ID age of a relation
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index b467660ee..e18ad8fd3 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -49,7 +49,7 @@
<para>
The internal transaction ID type <type>xid</type> is 32 bits wide
- and <link linkend="vacuum-for-wraparound">wraps around</link> every
+ and <link linkend="freezing-xid-space">wraps around</link> every
4 billion transactions. A 32-bit epoch is incremented during each
wraparound. There is also a 64-bit type <type>xid8</type> which
includes this epoch and therefore does not wrap around during the
@@ -100,7 +100,7 @@
rows and can be inspected using the <xref linkend="pgrowlocks"/>
extension. Row-level read locks might also require the assignment
of multixact IDs (<literal>mxid</literal>; see <xref
- linkend="vacuum-for-multixact-wraparound"/>).
+ linkend="freezing-xid-space"/>).
</para>
</sect1>
--
2.40.0
v2-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchapplication/octet-stream; name=v2-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchDownload
From a6a4d387975f9f2a414b813e5035957758829136 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:11:10 -0700
Subject: [PATCH v2 7/9] Make maintenance.sgml more autovacuum-orientated.
Now that it's no longer in its own sect2, shorten the "Vacuuming basics"
content, and make it more autovacuum-orientated. This gives much less
prominence to VACUUM FULL, which has little place in a section about
autovacuum. We no longer define avoiding the need to run VACUUM FULL as
the purpose of vacuuming.
A later commit that overhauls "Recovering Disk Space" will add back a
passing mention of things like VACUUM FULL and TRUNCATE, but only as
something that might be relevant in extreme cases. (Use of these
commands is hopefully neither "Routine" nor "Basic" to most users).
---
doc/src/sgml/maintenance.sgml | 91 +++++++++++++++++------------------
1 file changed, 44 insertions(+), 47 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2e18a078a..7476e5922 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -32,11 +32,12 @@
</para>
<para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
+ The other main category of maintenance task is periodic
+ <quote><link linkend="routine-vacuuming">vacuuming</link></quote> of
+ the database by autovacuum. Configuring autovacuum scheduling is
+ discussed in <xref linkend="autovacuum"/>. Autovacuum also updates
+ the statistics that will be used by the query planner, as discussed
+ in <xref linkend="vacuum-for-statistics"/>.
</para>
<para>
@@ -244,7 +245,7 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</sect1>
<sect1 id="routine-vacuuming">
- <title>Routine Vacuuming</title>
+ <title>Autovacuum Maintenance Tasks</title>
<indexterm zone="routine-vacuuming">
<primary>vacuum</primary>
@@ -252,24 +253,20 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<para>
<productname>PostgreSQL</productname> databases require periodic
- maintenance known as <firstterm>vacuuming</firstterm>. For many installations, it
- is sufficient to let vacuuming be performed by the <firstterm>autovacuum
- daemon</firstterm>, which is described in <xref linkend="autovacuum"/>. You might
- need to adjust the autovacuuming parameters described there to obtain best
- results for your situation. Some database administrators will want to
- supplement or replace the daemon's activities with manually-managed
- <command>VACUUM</command> commands, which typically are executed according to a
- schedule by <application>cron</application> or <application>Task
- Scheduler</application> scripts. To set up manually-managed vacuuming properly,
- it is essential to understand the issues discussed in the next few
- subsections. Administrators who rely on autovacuuming may still wish
- to skim this material to help them understand and adjust autovacuuming.
+ maintenance known as <firstterm>vacuuming</firstterm>, and require
+ periodic updates to the statistics used by the
+ <productname>PostgreSQL</productname> query planner. These
+ maintenance tasks are performed by the <link
+ linkend="sql-vacuum"><command>VACUUM</command></link> and <link
+ linkend="sql-analyze"><command>ANALYZE</command></link> commands
+ respectively. For most installations, it is sufficient to let the
+ <firstterm>autovacuum daemon</firstterm> determine when to perform
+ these maintenance tasks (which is partly determined by configurable
+ table-level thresholds; see <xref linkend="autovacuum"/>).
</para>
-
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ The autovacuum daemon has to process each table on a regular basis
+ for several reasons:
<orderedlist>
<listitem>
@@ -295,35 +292,35 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
</orderedlist>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
+ Maintenance work within the scope of items 1, 2, 3, and 4 is
+ performed by the <command>VACUUM</command> command internally.
+ Item 5 (maintenance of planner statistics) is handled by the
+ <command>ANALYZE</command> command internally. Although this
+ section presents information about autovacuum, there is no
+ difference between manually-issued <command>VACUUM</command> and
+ <command>ANALYZE</command> commands and those run by the autovacuum
+ daemon (though there are autovacuum-specific variants of a small
+ number of settings that control <command>VACUUM</command>).
</para>
-
<para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
+ Autovacuum creates a substantial amount of I/O traffic, which can
+ cause poor performance for other active sessions. There are
+ configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming. See the
+ autovacuum-specific cost delay settings described in
+ <xref linkend="runtime-config-autovacuum"/>, and additional cost
+ delay settings described in
<xref linkend="runtime-config-resource-vacuum-cost"/>.
</para>
+ <para>
+ Some database administrators will want to supplement the daemon's
+ activities with manually-managed <command>VACUUM</command>
+ commands, which typically are executed according to a schedule by
+ <application>cron</application> or <application>Task
+ Scheduler</application> scripts. It can be useful to perform
+ off-hours <command>VACUUM</command> commands during periods where
+ reduced load is expected.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.0
v2-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v2-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload
From fa9dfe413c6ced9a3ed38cc2f295e5af737683d4 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 12 Apr 2023 14:42:06 -0700
Subject: [PATCH v2 1/9] Make autovacuum docs into a sect1 of its own.
This doesn't change any of the content itself.
---
doc/src/sgml/maintenance.sgml | 332 +++++++++++++++++-----------------
1 file changed, 166 insertions(+), 166 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a..a6295c399 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -59,6 +59,172 @@
pleasant and productive experience with the system.
</para>
+ <sect1 id="autovacuum">
+ <title>The Autovacuum Daemon</title>
+
+ <indexterm>
+ <primary>autovacuum</primary>
+ <secondary>general information</secondary>
+ </indexterm>
+ <para>
+ <productname>PostgreSQL</productname> has an optional but highly
+ recommended feature called <firstterm>autovacuum</firstterm>,
+ whose purpose is to automate the execution of
+ <command>VACUUM</command> and <command>ANALYZE</command> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <xref
+ linkend="guc-track-counts"/> is set to <literal>true</literal>.
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </para>
+
+ <para>
+ The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+ <firstterm>autovacuum worker</firstterm> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+ seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
+ a new worker will be launched every
+ <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+ A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+ are allowed to run at the same time. If there are more than
+ <varname>autovacuum_max_workers</varname> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+ <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+ autovacuum workers' activity.
+ </para>
+
+ <para>
+ If several large tables all become eligible for vacuuming in a short
+ amount of time, all autovacuum workers might become occupied with
+ vacuuming those tables for a long period. This would result
+ in other tables and databases not being vacuumed until a worker becomes
+ available. There is no limit on how many workers might be in a
+ single database, but workers do try to avoid repeating work that has
+ already been done by other workers. Note that the number of running
+ workers does not count towards <xref linkend="guc-max-connections"/> or
+ <xref linkend="guc-superuser-reserved-connections"/> limits.
+ </para>
+
+ <para>
+ Tables whose <structfield>relfrozenxid</structfield> value is more than
+ <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+ table is vacuumed. The vacuum threshold is defined as:
+<programlisting>
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</programlisting>
+ where the vacuum base threshold is
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ the vacuum scale factor is
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+ and the number of tuples is
+ <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+ </para>
+
+ <para>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
+<programlisting>
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</programlisting>
+ where the vacuum insert base threshold is
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+ and vacuum insert scale factor is
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+ Such vacuums may allow portions of the table to be marked as
+ <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <command>INSERT</command> operations but no or
+ almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+ it may be beneficial to lower the table's
+ <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the cumulative statistics system;
+ it is a semi-accurate count updated by each <command>UPDATE</command>,
+ <command>DELETE</command> and <command>INSERT</command> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <structfield>relfrozenxid</structfield> value of the table
+ is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </para>
+
+ <para>
+ For analyze, a similar condition is used: the threshold, defined as:
+<programlisting>
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</programlisting>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <command>ANALYZE</command>.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+
+ <para>
+ The default thresholds and scale factors are taken from
+ <filename>postgresql.conf</filename>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <xref linkend="sql-createtable-storage-parameters"/> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+ the global settings.
+ </para>
+
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+
+ <para>
+ Autovacuum workers generally don't block other commands. If a process
+ attempts to acquire a lock that conflicts with the
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+ acquisition will interrupt the autovacuum. For conflicting lock modes,
+ see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
+ is running to prevent transaction ID wraparound (i.e., the autovacuum query
+ name in the <structname>pg_stat_activity</structname> view ends with
+ <literal>(to prevent wraparound)</literal>), the autovacuum is not
+ automatically interrupted.
+ </para>
+
+ <warning>
+ <para>
+ Regularly running commands that acquire locks conflicting with a
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+ effectively prevent autovacuums from ever completing.
+ </para>
+ </warning>
+ </sect1>
+
<sect1 id="routine-vacuuming">
<title>Routine Vacuuming</title>
@@ -749,172 +915,6 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
-
- <sect2 id="autovacuum">
- <title>The Autovacuum Daemon</title>
-
- <indexterm>
- <primary>autovacuum</primary>
- <secondary>general information</secondary>
- </indexterm>
- <para>
- <productname>PostgreSQL</productname> has an optional but highly
- recommended feature called <firstterm>autovacuum</firstterm>,
- whose purpose is to automate the execution of
- <command>VACUUM</command> and <command>ANALYZE</command> commands.
- When enabled, autovacuum checks for
- tables that have had a large number of inserted, updated or deleted
- tuples. These checks use the statistics collection facility;
- therefore, autovacuum cannot be used unless <xref
- linkend="guc-track-counts"/> is set to <literal>true</literal>.
- In the default configuration, autovacuuming is enabled and the related
- configuration parameters are appropriately set.
- </para>
-
- <para>
- The <quote>autovacuum daemon</quote> actually consists of multiple processes.
- There is a persistent daemon process, called the
- <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
- <firstterm>autovacuum worker</firstterm> processes for all databases. The
- launcher will distribute the work across time, attempting to start one
- worker within each database every <xref linkend="guc-autovacuum-naptime"/>
- seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
- a new worker will be launched every
- <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
- A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
- are allowed to run at the same time. If there are more than
- <varname>autovacuum_max_workers</varname> databases to be processed,
- the next database will be processed as soon as the first worker finishes.
- Each worker process will check each table within its database and
- execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
- <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
- autovacuum workers' activity.
- </para>
-
- <para>
- If several large tables all become eligible for vacuuming in a short
- amount of time, all autovacuum workers might become occupied with
- vacuuming those tables for a long period. This would result
- in other tables and databases not being vacuumed until a worker becomes
- available. There is no limit on how many workers might be in a
- single database, but workers do try to avoid repeating work that has
- already been done by other workers. Note that the number of running
- workers does not count towards <xref linkend="guc-max-connections"/> or
- <xref linkend="guc-superuser-reserved-connections"/> limits.
- </para>
-
- <para>
- Tables whose <structfield>relfrozenxid</structfield> value is more than
- <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
- vacuumed (this also applies to those tables whose freeze max age has
- been modified via storage parameters; see below). Otherwise, if the
- number of tuples obsoleted since the last
- <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
- table is vacuumed. The vacuum threshold is defined as:
-<programlisting>
-vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
-</programlisting>
- where the vacuum base threshold is
- <xref linkend="guc-autovacuum-vacuum-threshold"/>,
- the vacuum scale factor is
- <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
- and the number of tuples is
- <structname>pg_class</structname>.<structfield>reltuples</structfield>.
- </para>
-
- <para>
- The table is also vacuumed if the number of tuples inserted since the last
- vacuum has exceeded the defined insert threshold, which is defined as:
-<programlisting>
-vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
-</programlisting>
- where the vacuum insert base threshold is
- <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
- and vacuum insert scale factor is
- <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
- Such vacuums may allow portions of the table to be marked as
- <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
- it is a semi-accurate count updated by each <command>UPDATE</command>,
- <command>DELETE</command> and <command>INSERT</command> operation. (It is
- only semi-accurate because some information might be lost under heavy
- load.) If the <structfield>relfrozenxid</structfield> value of the table
- is more than <varname>vacuum_freeze_table_age</varname> transactions old,
- an aggressive vacuum is performed to freeze old tuples and advance
- <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
- since the last vacuum are scanned.
- </para>
-
- <para>
- For analyze, a similar condition is used: the threshold, defined as:
-<programlisting>
-analyze threshold = analyze base threshold + analyze scale factor * number of tuples
-</programlisting>
- is compared to the total number of tuples inserted, updated, or deleted
- since the last <command>ANALYZE</command>.
- </para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
- <para>
- The default thresholds and scale factors are taken from
- <filename>postgresql.conf</filename>, but it is possible to override them
- (and many other autovacuum control parameters) on a per-table basis; see
- <xref linkend="sql-createtable-storage-parameters"/> for more information.
- If a setting has been changed via a table's storage parameters, that value
- is used when processing that table; otherwise the global settings are
- used. See <xref linkend="runtime-config-autovacuum"/> for more details on
- the global settings.
- </para>
-
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
-
- <para>
- Autovacuum workers generally don't block other commands. If a process
- attempts to acquire a lock that conflicts with the
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
- acquisition will interrupt the autovacuum. For conflicting lock modes,
- see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
- is running to prevent transaction ID wraparound (i.e., the autovacuum query
- name in the <structname>pg_stat_activity</structname> view ends with
- <literal>(to prevent wraparound)</literal>), the autovacuum is not
- automatically interrupted.
- </para>
-
- <warning>
- <para>
- Regularly running commands that acquire locks conflicting with a
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
- effectively prevent autovacuums from ever completing.
- </para>
- </warning>
- </sect2>
</sect1>
--
2.40.0
v2-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v2-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload
From 8ea9aae6e5e1c7482cacf19eb1930ec0bf916ab8 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:33:42 -0700
Subject: [PATCH v2 9/9] Overhaul "Recovering Disk Space" vacuuming docs.
Say a lot more about the possible impact of long-running transactions on
VACUUM. Remove all talk of administrators getting by without
autovacuum; at most administrators might want to schedule manual VACUUM
operations to supplement autovacuum (this documentation was written at a
time when the visibility map didn't exist, even in its most basic form).
Also describe VACUUM FULL as an entirely different kind of operation to
conventional lazy vacuum.
---
doc/src/sgml/maintenance.sgml | 173 ++++++++++++++++++----------------
1 file changed, 93 insertions(+), 80 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 675f6945d..0920855ae 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -342,100 +342,113 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
This approach is necessary to gain the benefits of multiversion
concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version
must not be deleted while it is still potentially visible to other
- transactions. But eventually, an outdated or deleted row version is no
- longer of interest to any transaction. The space it occupies must then be
- reclaimed for reuse by new rows, to avoid unbounded growth of disk
- space requirements. This is done by running <command>VACUUM</command>.
+ transactions. A deleted row version (whether from an
+ <command>UPDATE</command> or <command>DELETE</command>) will
+ usually cease to be of interest to any still running transaction
+ shortly after the original deleting transaction commits.
</para>
<para>
- The standard form of <command>VACUUM</command> removes dead row
- versions in tables and indexes and marks the space available for
- future reuse. However, it will not return the space to the operating
- system, except in the special case where one or more pages at the
- end of a table become entirely free and an exclusive table lock can be
- easily obtained. In contrast, <command>VACUUM FULL</command> actively compacts
- tables by writing a complete new version of the table file with no dead
- space. This minimizes the size of the table, but can take a long time.
- It also requires extra disk space for the new copy of the table, until
- the operation completes.
+ The space dead tuples occupy must eventually be reclaimed for
+ reuse by new rows, to avoid unbounded growth of disk space
+ requirements. Reclaiming space from dead rows is
+ <command>VACUUM</command>'s main responsibility.
</para>
<para>
- The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s
- often enough to avoid needing <command>VACUUM FULL</command>. The
- autovacuum daemon attempts to work this way, and in fact will
- never issue <command>VACUUM FULL</command>. In this approach, the idea
- is not to keep tables at their minimum size, but to maintain steady-state
- usage of disk space: each table occupies space equivalent to its
- minimum size plus however much space gets used up between vacuum runs.
- Although <command>VACUUM FULL</command> can be used to shrink a table back
- to its minimum size and return the disk space to the operating system,
- there is not much point in this if the table will just grow again in the
- future. Thus, moderately-frequent standard <command>VACUUM</command> runs are a
- better approach than infrequent <command>VACUUM FULL</command> runs for
- maintaining heavily-updated tables.
- </para>
-
- <para>
- Some administrators prefer to schedule vacuuming themselves, for example
- doing all the work at night when load is low.
- The difficulty with doing vacuuming according to a fixed schedule
- is that if a table has an unexpected spike in update activity, it may
- get bloated to the point that <command>VACUUM FULL</command> is really necessary
- to reclaim space. Using the autovacuum daemon alleviates this problem,
- since the daemon schedules vacuuming dynamically in response to update
- activity. It is unwise to disable the daemon completely unless you
- have an extremely predictable workload. One possible compromise is
- to set the daemon's parameters so that it will only react to unusually
- heavy update activity, thus keeping things from getting out of hand,
- while scheduled <command>VACUUM</command>s are expected to do the bulk of the
- work when the load is typical.
- </para>
-
- <para>
- For those not using autovacuum, a typical approach is to schedule a
- database-wide <command>VACUUM</command> once a day during a low-usage period,
- supplemented by more frequent vacuuming of heavily-updated tables as
- necessary. (Some installations with extremely high update rates vacuum
- their busiest tables as often as once every few minutes.) If you have
- multiple databases in a cluster, don't forget to
- <command>VACUUM</command> each one; the program <xref
- linkend="app-vacuumdb"/> might be helpful.
+ The XID cutoff point that <command>VACUUM</command> uses to
+ determine whether or not a deleted tuple is safe to physically
+ remove is reported under <literal>removable cutoff</literal> in
+ the server log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum.
+ Tuples that are not yet safe to remove are counted as
+ <literal>dead but not yet removable</literal> tuples in the log
+ report. <command>VACUUM</command> establishes its
+ <literal>removable cutoff</literal> once, at the start of the
+ operation. Any older snapshot (or transaction that allocates an
+ XID) that's still running when the cutoff is established may hold
+ it back.
</para>
<tip>
- <para>
- Plain <command>VACUUM</command> may not be satisfactory when
- a table contains large numbers of dead row versions as a result of
- massive update or delete activity. If you have such a table and
- you need to reclaim the excess disk space it occupies, you will need
- to use <command>VACUUM FULL</command>, or alternatively
- <link linkend="sql-cluster"><command>CLUSTER</command></link>
- or one of the table-rewriting variants of
- <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
- These commands rewrite an entire new copy of the table and build
- new indexes for it. All these options require an
- <literal>ACCESS EXCLUSIVE</literal> lock. Note that
- they also temporarily use extra disk space approximately equal to the size
- of the table, since the old copies of the table and indexes can't be
- released until the new ones are complete.
- </para>
+ <para>
+ It's important that no long running transactions ever be allowed
+ to hold back every <command>VACUUM</command> operation's cutoff
+ for an extended period. You may wish to monitor this.
+ </para>
</tip>
- <tip>
+ <note>
+ <para>
+ Tuples inserted by aborted transactions can be removed by
+ <command>VACUUM</command> immediately
+ </para>
+ </note>
+
<para>
- If you have a table whose entire contents are deleted on a periodic
- basis, consider doing it with
- <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather
- than using <command>DELETE</command> followed by
- <command>VACUUM</command>. <command>TRUNCATE</command> removes the
- entire content of the table immediately, without requiring a
- subsequent <command>VACUUM</command> or <command>VACUUM
- FULL</command> to reclaim the now-unused disk space.
- The disadvantage is that strict MVCC semantics are violated.
+ <command>VACUUM</command> will not return space to the operating
+ system, except in the special case where a group of contiguous
+ pages at the end of a table become entirely free and an exclusive
+ table lock can be easily obtained. This relation truncation
+ behavior can be disabled in tables where the exclusive lock is
+ disruptive by setting the table's <varname>vacuum_truncate</varname>
+ storage parameter to <literal>off</literal>.
</para>
+
+ <tip>
+ <para>
+ If you have a table whose entire contents are deleted on a
+ periodic basis, consider doing it with <link
+ linkend="sql-truncate"><command>TRUNCATE</command></link> rather
+ than relying on <command>VACUUM</command>.
+ <command>TRUNCATE</command> removes the entire contents of the
+ table immediately, avoiding the need to set
+ <structfield>xmax</structfield> to the deleting transaction's XID.
+ One disadvantage is that strict MVCC semantics are violated.
+ </para>
</tip>
+ <tip>
+ <para>
+ <command>VACUUM FULL</command> or <command>CLUSTER</command> can
+ be useful when dealing with extreme amounts of dead tuples. It
+ can reclaim more disk space, but runs much more slowly. It
+ rewrites an entire new copy of the table and rebuilds all indexes.
+ This typically has much higher overhead than
+ <command>VACUUM</command>. Generally, therefore, administrators
+ should avoid using <command>VACUUM FULL</command> except in the
+ most extreme cases.
+ </para>
+ </tip>
+ <note>
+ <para>
+ Although <command>VACUUM FULL</command> is technically an option
+ of the <command>VACUUM</command> command, <command>VACUUM
+ FULL</command> uses a completely different implementation.
+ <command>VACUUM FULL</command> is essentially a variant of
+ <command>CLUSTER</command>. (The name <command>VACUUM
+ FULL</command> is historical; the original implementation was
+ somewhat closer to standard <command>VACUUM</command>.)
+ </para>
+ </note>
+ <warning>
+ <para>
+ <command>TRUNCATE</command>, <command>VACUUM FULL</command>, and
+ <command>CLUSTER</command> all require an <literal>ACCESS
+ EXCLUSIVE</literal> lock, which can be highly disruptive
+ (<command>SELECT</command>, <command>INSERT</command>,
+ <command>UPDATE</command>, and <command>DELETE</command> commands
+ will all be blocked).
+ </para>
+ </warning>
+ <warning>
+ <para>
+ <command>VACUUM FULL</command> (and <command>CLUSTER</command>)
+ temporarily uses extra disk space approximately equal to the size
+ of the table, since the old copies of the table and indexes can't
+ be released until the new ones are complete.
+ </para>
+ </warning>
</sect2>
<sect2 id="freezing-xid-space">
--
2.40.0
v2-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v2-0004-Reorder-routine-vacuuming-sections.patchDownload
From b11129d51edef5778eae5de89bbcdccced4582db Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:19:50 -0700
Subject: [PATCH v2 4/9] Reorder routine vacuuming sections.
This doesn't change any of the content itself. It is a mechanical
change. The new order flows better because it talks about freezing
directly after talking about space recovery tasks.
Old order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-statistics">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-wraparound">
New order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-wraparound">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-statistics">
The new order matches processing order inside vacuumlazy.c. This order
will be easier to work with in two later commits that more or less
rewrite "vacuum-for-wraparound" and "vacuum-for-space-recovery".
(Though it doesn't seem to make the existing content any less meaningful
without the later rewrite commits.)
---
doc/src/sgml/maintenance.sgml | 306 +++++++++++++++++-----------------
1 file changed, 155 insertions(+), 151 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e8c8647cd..62e22d861 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -281,8 +281,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
</listitem>
<listitem>
@@ -292,9 +293,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
</listitem>
</orderedlist>
@@ -439,151 +439,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-statistics">
- <title>Updating Planner Statistics</title>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>statistics</primary>
- <secondary>of the planner</secondary>
- </indexterm>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>ANALYZE</primary>
- </indexterm>
-
- <para>
- The <productname>PostgreSQL</productname> query planner relies on
- statistical information about the contents of tables in order to
- generate good plans for queries. These statistics are gathered by
- the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
- which can be invoked by itself or
- as an optional step in <command>VACUUM</command>. It is important to have
- reasonably accurate statistics, otherwise poor choices of plans might
- degrade database performance.
- </para>
-
- <para>
- The autovacuum daemon, if enabled, will automatically issue
- <command>ANALYZE</command> commands whenever the content of a table has
- changed sufficiently. However, administrators might prefer to rely
- on manually-scheduled <command>ANALYZE</command> operations, particularly
- if it is known that update activity on a table will not affect the
- statistics of <quote>interesting</quote> columns. The daemon schedules
- <command>ANALYZE</command> strictly as a function of the number of rows
- inserted or updated; it has no knowledge of whether that will lead
- to meaningful statistical changes.
- </para>
-
- <para>
- Tuples changed in partitions and inheritance children do not trigger
- analyze on the parent table. If the parent table is empty or rarely
- changed, it may never be processed by autovacuum, and the statistics for
- the inheritance tree as a whole won't be collected. It is necessary to
- run <command>ANALYZE</command> on the parent table manually in order to
- keep the statistics up to date.
- </para>
-
- <para>
- As with vacuuming for space recovery, frequent updates of statistics
- are more useful for heavily-updated tables than for seldom-updated
- ones. But even for a heavily-updated table, there might be no need for
- statistics updates if the statistical distribution of the data is
- not changing much. A simple rule of thumb is to think about how much
- the minimum and maximum values of the columns in the table change.
- For example, a <type>timestamp</type> column that contains the time
- of row update will have a constantly-increasing maximum value as
- rows are added and updated; such a column will probably need more
- frequent statistics updates than, say, a column containing URLs for
- pages accessed on a website. The URL column might receive changes just
- as often, but the statistical distribution of its values probably
- changes relatively slowly.
- </para>
-
- <para>
- It is possible to run <command>ANALYZE</command> on specific tables and even
- just specific columns of a table, so the flexibility exists to update some
- statistics more frequently than others if your application requires it.
- In practice, however, it is usually best to just analyze the entire
- database, because it is a fast operation. <command>ANALYZE</command> uses a
- statistically random sampling of the rows of a table rather than reading
- every single row.
- </para>
-
- <tip>
- <para>
- Although per-column tweaking of <command>ANALYZE</command> frequency might not be
- very productive, you might find it worthwhile to do per-column
- adjustment of the level of detail of the statistics collected by
- <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
- clauses and have highly irregular data distributions might require a
- finer-grain data histogram than other columns. See <command>ALTER TABLE
- SET STATISTICS</command>, or change the database-wide default using the <xref
- linkend="guc-default-statistics-target"/> configuration parameter.
- </para>
-
- <para>
- Also, by default there is limited information available about
- the selectivity of functions. However, if you create a statistics
- object or an expression
- index that uses a function call, useful statistics will be
- gathered about the function, which can greatly improve query
- plans that use the expression index.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands for
- foreign tables, since it has no means of determining how often that
- might be useful. If your queries require statistics on foreign tables
- for proper planning, it's a good idea to run manually-managed
- <command>ANALYZE</command> commands on those tables on a suitable schedule.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands
- for partitioned tables. Inheritance parents will only be analyzed if the
- parent itself is changed - changes to child tables do not trigger
- autoanalyze on the parent table. If your queries require statistics on
- parent tables for proper planning, it is necessary to periodically run
- a manual <command>ANALYZE</command> on those tables to keep the statistics
- up to date.
- </para>
- </tip>
-
- </sect2>
-
- <sect2 id="vacuum-for-visibility-map">
- <title>Updating the Visibility Map</title>
-
- <para>
- Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
- table to keep track of which pages contain only tuples that are known to be
- visible to all active transactions (and all future transactions, until the
- page is again modified). This has two purposes. First, vacuum
- itself can skip such pages on the next run, since there is nothing to
- clean up.
- </para>
-
- <para>
- Second, it allows <productname>PostgreSQL</productname> to answer some
- queries using only the index, without reference to the underlying table.
- Since <productname>PostgreSQL</productname> indexes don't contain tuple
- visibility information, a normal index scan fetches the heap tuple for each
- matching index entry, to check whether it should be seen by the current
- transaction.
- An <link linkend="indexes-index-only-scans"><firstterm>index-only
- scan</firstterm></link>, on the other hand, checks the visibility map first.
- If it's known that all tuples on the page are
- visible, the heap fetch can be skipped. This is most useful on
- large data sets where the visibility map can prevent disk accesses.
- The visibility map is vastly smaller than the heap, so it can easily be
- cached even when the heap is very large.
- </para>
- </sect2>
-
<sect2 id="vacuum-for-wraparound">
<title>Preventing Transaction ID Wraparound Failures</title>
@@ -933,7 +788,156 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
- </sect1>
+
+ <sect2 id="vacuum-for-visibility-map">
+ <title>Updating the Visibility Map</title>
+
+ <para>
+ Vacuum maintains a <link linkend="storage-vm">visibility
+ map</link> for each table to keep track of which pages contain
+ only tuples that are known to be visible to all active
+ transactions (and all future transactions, until the page is again
+ modified). This has two purposes. First, vacuum itself can skip
+ such pages on the next run, since there is nothing to clean up.
+ Even <command>VACUUM</command>s that use the <link
+ linkend="aggressive-strategy">aggressive strategy</link> can skip
+ pages that are both all-visible and all-frozen (the visibility map
+ keeps track of which pages are all-frozen separately).
+ </para>
+
+ <para>
+ Second, it allows <productname>PostgreSQL</productname> to answer
+ some queries using only the index, without reference to the
+ underlying table. Since <productname>PostgreSQL</productname>
+ indexes don't contain tuple visibility information, a normal index
+ scan fetches the heap tuple for each matching index entry, to
+ check whether it should be seen by the current transaction. An
+ <link linkend="indexes-index-only-scans"><firstterm>index-only
+ scan</firstterm></link>, on the other hand, checks the
+ visibility map first. If it's known that all tuples on the page
+ are visible, the heap fetch can be skipped. This is most useful
+ on large data sets where the visibility map can prevent disk
+ accesses. The visibility map is vastly smaller than the heap, so
+ it can easily be cached even when the heap is very large.
+ </para>
+ </sect2>
+
+ <sect2 id="vacuum-for-statistics">
+ <title>Updating Planner Statistics</title>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>statistics</primary>
+ <secondary>of the planner</secondary>
+ </indexterm>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>ANALYZE</primary>
+ </indexterm>
+
+ <para>
+ The <productname>PostgreSQL</productname> query planner relies on
+ statistical information about the contents of tables in order to
+ generate good plans for queries. These statistics are gathered by
+ the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
+ which can be invoked by itself or
+ as an optional step in <command>VACUUM</command>. It is important to have
+ reasonably accurate statistics, otherwise poor choices of plans might
+ degrade database performance.
+ </para>
+
+ <para>
+ The autovacuum daemon, if enabled, will automatically issue
+ <command>ANALYZE</command> commands whenever the content of a table has
+ changed sufficiently. However, administrators might prefer to rely
+ on manually-scheduled <command>ANALYZE</command> operations, particularly
+ if it is known that update activity on a table will not affect the
+ statistics of <quote>interesting</quote> columns. The daemon schedules
+ <command>ANALYZE</command> strictly as a function of the number of rows
+ inserted or updated; it has no knowledge of whether that will lead
+ to meaningful statistical changes.
+ </para>
+
+ <para>
+ Tuples changed in partitions and inheritance children do not trigger
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum, and the statistics for
+ the inheritance tree as a whole won't be collected. It is necessary to
+ run <command>ANALYZE</command> on the parent table manually in order to
+ keep the statistics up to date.
+ </para>
+
+ <para>
+ As with vacuuming for space recovery, frequent updates of statistics
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there might be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
+ the minimum and maximum values of the columns in the table change.
+ For example, a <type>timestamp</type> column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column might receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
+ </para>
+
+ <para>
+ It is possible to run <command>ANALYZE</command> on specific tables and even
+ just specific columns of a table, so the flexibility exists to update some
+ statistics more frequently than others if your application requires it.
+ In practice, however, it is usually best to just analyze the entire
+ database, because it is a fast operation. <command>ANALYZE</command> uses a
+ statistically random sampling of the rows of a table rather than reading
+ every single row.
+ </para>
+
+ <tip>
+ <para>
+ Although per-column tweaking of <command>ANALYZE</command> frequency might not be
+ very productive, you might find it worthwhile to do per-column
+ adjustment of the level of detail of the statistics collected by
+ <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
+ clauses and have highly irregular data distributions might require a
+ finer-grain data histogram than other columns. See <command>ALTER TABLE
+ SET STATISTICS</command>, or change the database-wide default using the <xref
+ linkend="guc-default-statistics-target"/> configuration parameter.
+ </para>
+
+ <para>
+ Also, by default there is limited information available about
+ the selectivity of functions. However, if you create a statistics
+ object or an expression
+ index that uses a function call, useful statistics will be
+ gathered about the function, which can greatly improve query
+ plans that use the expression index.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ foreign tables, since it has no means of determining how often that
+ might be useful. If your queries require statistics on foreign tables
+ for proper planning, it's a good idea to run manually-managed
+ <command>ANALYZE</command> commands on those tables on a suitable schedule.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands
+ for partitioned tables. Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table. If your queries require statistics on
+ parent tables for proper planning, it is necessary to periodically run
+ a manual <command>ANALYZE</command> on those tables to keep the statistics
+ up to date.
+ </para>
+ </tip>
+
+ </sect2>
+</sect1>
<sect1 id="routine-reindex">
--
2.40.0
v2-0002-Restructure-autovacuum-daemon-section.patchapplication/octet-stream; name=v2-0002-Restructure-autovacuum-daemon-section.patchDownload
From 234f909f8c54c4e421ff64dd5c26901a7373486f Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 24 Apr 2023 09:21:01 -0700
Subject: [PATCH v2 2/9] Restructure autovacuum daemon section.
Add sect2/sect3 subsections to autovacuum sect1. Also reorder the
content slightly for clarity.
TODO Add some basic explanations of vacuuming and relfrozenxid
advancement, since that now appears later on in the chapter.
Alternatively, move the autovacuum daemon sect1 after the routine
vacuuming sect1.
---
doc/src/sgml/maintenance.sgml | 66 ++++++++++++++++++++++-------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index a6295c399..6a7ec7c1d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -100,6 +100,8 @@
autovacuum workers' activity.
</para>
+ <sect2 id="autovacuum-scheduling">
+ <title>Autovacuum Scheduling</title>
<para>
If several large tables all become eligible for vacuuming in a short
amount of time, all autovacuum workers might become occupied with
@@ -112,6 +114,8 @@
<xref linkend="guc-superuser-reserved-connections"/> limits.
</para>
+ <sect3 id="autovacuum-vacuum-thresholds">
+ <title>Configurable thresholds for vacuuming</title>
<para>
Tables whose <structfield>relfrozenxid</structfield> value is more than
<xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
@@ -159,7 +163,10 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
since the last vacuum are scanned.
</para>
+ </sect3>
+ <sect3 id="autovacuum-analyze-thresholds">
+ <title>Configurable thresholds for <command>ANALYZE</command></title>
<para>
For analyze, a similar condition is used: the threshold, defined as:
<programlisting>
@@ -168,20 +175,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
</para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
<para>
The default thresholds and scale factors are taken from
<filename>postgresql.conf</filename>, but it is possible to override them
@@ -192,18 +185,25 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
used. See <xref linkend="runtime-config-autovacuum"/> for more details on
the global settings.
</para>
+ </sect3>
+ </sect2>
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
+ <sect2 id="autovacuum-cost-delays">
+ <title>Autovacuum Cost-based Delays</title>
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+ </sect2>
+ <sect2 id="autovacuum-lock-conflicts">
+ <title>Autovacuum and Lock Conflicts</title>
<para>
Autovacuum workers generally don't block other commands. If a process
attempts to acquire a lock that conflicts with the
@@ -223,6 +223,24 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
effectively prevent autovacuums from ever completing.
</para>
</warning>
+ </sect2>
+
+ <sect2 id="autovacuum-limitations">
+ <title>Limitations</title>
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+ </sect2>
+
</sect1>
<sect1 id="routine-vacuuming">
--
2.40.0
v2-0003-Normalize-maintenance.sgml-indentation.patchapplication/octet-stream; name=v2-0003-Normalize-maintenance.sgml-indentation.patchDownload
From e57d8ac92105f4ffb6ff0d8ec72e5c032d749e4c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 15:20:13 -0700
Subject: [PATCH v2 3/9] Normalize maintenance.sgml indentation.
---
doc/src/sgml/maintenance.sgml | 82 +++++++++++++++++------------------
1 file changed, 41 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 6a7ec7c1d..e8c8647cd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -11,53 +11,53 @@
<primary>routine maintenance</primary>
</indexterm>
- <para>
- <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
- be performed regularly to achieve optimum performance. The tasks
- discussed here are <emphasis>required</emphasis>, but they
- are repetitive in nature and can easily be automated using standard
- tools such as <application>cron</application> scripts or
- Windows' <application>Task Scheduler</application>. It is the database
- administrator's responsibility to set up appropriate scripts, and to
- check that they execute successfully.
- </para>
+ <para>
+ <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
+ be performed regularly to achieve optimum performance. The tasks
+ discussed here are <emphasis>required</emphasis>, but they
+ are repetitive in nature and can easily be automated using standard
+ tools such as <application>cron</application> scripts or
+ Windows' <application>Task Scheduler</application>. It is the database
+ administrator's responsibility to set up appropriate scripts, and to
+ check that they execute successfully.
+ </para>
- <para>
- One obvious maintenance task is the creation of backup copies of the data on a
- regular schedule. Without a recent backup, you have no chance of recovery
- after a catastrophe (disk failure, fire, mistakenly dropping a critical
- table, etc.). The backup and recovery mechanisms available in
- <productname>PostgreSQL</productname> are discussed at length in
- <xref linkend="backup"/>.
- </para>
+ <para>
+ One obvious maintenance task is the creation of backup copies of the data on a
+ regular schedule. Without a recent backup, you have no chance of recovery
+ after a catastrophe (disk failure, fire, mistakenly dropping a critical
+ table, etc.). The backup and recovery mechanisms available in
+ <productname>PostgreSQL</productname> are discussed at length in
+ <xref linkend="backup"/>.
+ </para>
- <para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
- </para>
+ <para>
+ The other main category of maintenance task is periodic <quote>vacuuming</quote>
+ of the database. This activity is discussed in
+ <xref linkend="routine-vacuuming"/>. Closely related to this is updating
+ the statistics that will be used by the query planner, as discussed in
+ <xref linkend="vacuum-for-statistics"/>.
+ </para>
- <para>
- Another task that might need periodic attention is log file management.
- This is discussed in <xref linkend="logfile-maintenance"/>.
- </para>
+ <para>
+ Another task that might need periodic attention is log file management.
+ This is discussed in <xref linkend="logfile-maintenance"/>.
+ </para>
- <para>
- <ulink
+ <para>
+ <ulink
url="https://bucardo.org/check_postgres/"><application>check_postgres</application></ulink>
- is available for monitoring database health and reporting unusual
- conditions. <application>check_postgres</application> integrates with
- Nagios and MRTG, but can be run standalone too.
- </para>
+ is available for monitoring database health and reporting unusual
+ conditions. <application>check_postgres</application> integrates with
+ Nagios and MRTG, but can be run standalone too.
+ </para>
- <para>
- <productname>PostgreSQL</productname> is low-maintenance compared
- to some other database management systems. Nonetheless,
- appropriate attention to these tasks will go far towards ensuring a
- pleasant and productive experience with the system.
- </para>
+ <para>
+ <productname>PostgreSQL</productname> is low-maintenance compared
+ to some other database management systems. Nonetheless,
+ appropriate attention to these tasks will go far towards ensuring a
+ pleasant and productive experience with the system.
+ </para>
<sect1 id="autovacuum">
<title>The Autovacuum Daemon</title>
--
2.40.0
v2-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v2-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload
From 6104a112d52ccc08466b9c57a287d737150ff194 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:41:00 -0700
Subject: [PATCH v2 5/9] Move Interpreting XID stamps from tuple headers.
This is intended to be fairly close to a mechanical change. It isn't
entirely mechanical, though, since the original wording has been
slightly modified for it to work in context.
Structuring things this way should make life a little easier for doc
translators.
---
doc/src/sgml/maintenance.sgml | 81 +++++++----------------------------
doc/src/sgml/storage.sgml | 62 +++++++++++++++++++++++++++
2 files changed, 78 insertions(+), 65 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 62e22d861..f554e12bf 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -447,75 +447,26 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<secondary>wraparound</secondary>
</indexterm>
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
- </indexterm>
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs</secondary>
+ </indexterm>
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="mvcc-intro">MVCC</link> transaction semantics
- depend on being able to compare transaction ID (<acronym>XID</acronym>)
- numbers: a row version with an insertion XID greater than the current
- transaction's XID is <quote>in the future</quote> and should not be visible
- to the current transaction. But since transaction IDs have limited size
- (32 bits) a cluster that runs for a long time (more
- than 4 billion transactions) would suffer <firstterm>transaction ID
- wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden
- transactions that were in the past appear to be in the future — which
- means their output become invisible. In short, catastrophic data loss.
- (Actually the data is still there, but that's cold comfort if you cannot
- get at it.) To avoid this, it is necessary to vacuum every table
- in every database at least once every two billion transactions.
+ <productname>PostgreSQL</productname>'s <link
+ linkend="mvcc-intro">MVCC</link> transaction semantics depend on
+ being able to compare <glossterm linkend="glossary-xid">transaction
+ ID numbers (<acronym>XID</acronym>)</glossterm> to determine
+ whether or not the row is visible to each query's MVCC snapshot
+ (see <link linkend="interpreting-xid-stamps">
+ interpreting XID stamps from tuple headers</link>). But since
+ on-disk storage of transaction IDs in heap pages uses a truncated
+ 32-bit representation to save space (rather than the full 64-bit
+ representation), it is necessary to vacuum every table in every
+ database <emphasis>at least</emphasis> once every two billion
+ transactions (though far more frequent vacuuming is typical).
</para>
- <para>
- The reason that periodic vacuuming solves the problem is that
- <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
- they were inserted by a transaction that committed sufficiently far in
- the past that the effects of the inserting transaction are certain to be
- visible to all current and future transactions.
- Normal XIDs are
- compared using modulo-2<superscript>32</superscript> arithmetic. This means
- that for every normal XID, there are two billion XIDs that are
- <quote>older</quote> and two billion that are <quote>newer</quote>; another
- way to say it is that the normal XID space is circular with no
- endpoint. Therefore, once a row version has been created with a particular
- normal XID, the row version will appear to be <quote>in the past</quote> for
- the next two billion transactions, no matter which normal XID we are
- talking about. If the row version still exists after more than two billion
- transactions, it will suddenly appear to be in the future. To
- prevent this, <productname>PostgreSQL</productname> reserves a special XID,
- <literal>FrozenTransactionId</literal>, which does not follow the normal XID
- comparison rules and is always considered older
- than every normal XID.
- Frozen row versions are treated as if the inserting XID were
- <literal>FrozenTransactionId</literal>, so that they will appear to be
- <quote>in the past</quote> to all normal transactions regardless of wraparound
- issues, and so such row versions will be valid until deleted, no matter
- how long that is.
- </para>
-
- <note>
- <para>
- In <productname>PostgreSQL</productname> versions before 9.4, freezing was
- implemented by actually replacing a row's insertion XID
- with <literal>FrozenTransactionId</literal>, which was visible in the
- row's <structname>xmin</structname> system column. Newer versions just set a flag
- bit, preserving the row's original <structname>xmin</structname> for possible
- forensic use. However, rows with <structname>xmin</structname> equal
- to <literal>FrozenTransactionId</literal> (2) may still be found
- in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
- </para>
- <para>
- Also, system catalogs may contain rows with <structname>xmin</structname> equal
- to <literal>BootstrapTransactionId</literal> (1), indicating that they were
- inserted during the first phase of <application>initdb</application>.
- Like <literal>FrozenTransactionId</literal>, this special XID is treated as
- older than every normal XID.
- </para>
- </note>
-
<para>
<xref linkend="guc-vacuum-freeze-min-age"/>
controls how old an XID value has to be before rows bearing that XID will be
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index e5b9f3f1f..f31a002fc 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -1072,6 +1072,68 @@ data. Empty in ordinary tables.</entry>
it might be compressed, too (see <xref linkend="storage-toast"/>).
</para>
+
+ <sect3 id="interpreting-xid-stamps">
+ <title>Interpreting XID stamps from tuple headers</title>
+
+ <para>
+ The on-disk representation of transaction IDs (the representation
+ used in <structfield>t_xmin</structfield> and
+ <structfield>t_xmax</structfield> fields) use a truncated 32-bit
+ representation of transaction IDs, not the full 64-bit
+ representation. This is not suitable for long term storage without
+ special processing by <command>VACUUM</command>.
+ </para>
+
+ <para>
+ <command>VACUUM</command> <link linkend="routine-vacuuming">will
+ mark tuple headers <emphasis>frozen</emphasis></link>, indicating
+ that all eligible rows on the page were inserted by a transaction
+ that committed sufficiently far in the past that the effects of the
+ inserting transaction are certain to be visible to all current and
+ future transactions. Normal XIDs are compared using
+ modulo-2<superscript>32</superscript> arithmetic. This means that
+ for every normal XID, there are two billion XIDs that are
+ <quote>older</quote> and two billion that are <quote>newer</quote>;
+ another way to say it is that the normal XID space is circular with
+ no endpoint. Therefore, once a row version has been created with a
+ particular normal XID, the row version will appear to be <quote>in
+ the past</quote> for the next two billion transactions, no matter
+ which normal XID we are talking about. If the row version still
+ exists after more than two billion transactions, it will suddenly
+ appear to be in the future. To prevent this,
+ <productname>PostgreSQL</productname> reserves a special XID,
+ <literal>FrozenTransactionId</literal>, which does not follow the
+ normal XID comparison rules and is always considered older than
+ every normal XID. Frozen row versions are treated as if the
+ inserting XID were <literal>FrozenTransactionId</literal>, so that
+ they will appear to be <quote>in the past</quote> to all normal
+ transactions regardless of wraparound issues, and so such row
+ versions will be valid until deleted, no matter how long that is.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 9.4, freezing was
+ implemented by actually replacing a row's insertion XID
+ with <literal>FrozenTransactionId</literal>, which was visible in the
+ row's <structname>xmin</structname> system column. Newer versions just set a flag
+ bit, preserving the row's original <structname>xmin</structname> for possible
+ forensic use. However, rows with <structname>xmin</structname> equal
+ to <literal>FrozenTransactionId</literal> (2) may still be found
+ in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
+ </para>
+ <para>
+ Also, system catalogs may contain rows with <structname>xmin</structname> equal
+ to <literal>BootstrapTransactionId</literal> (1), indicating that they were
+ inserted during the first phase of <application>initdb</application>.
+ Like <literal>FrozenTransactionId</literal>, this special XID is treated as
+ older than every normal XID.
+ </para>
+ </note>
+
+</sect3>
+
</sect2>
</sect1>
--
2.40.0
On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john.naylor@enterprisedb.com> wrote:Now is a great time to revise this section, in my view. (I myself am
about ready to get back to testing and writing for the task of removing
that "obnoxious hint".)
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)
If it helps, I've gone ahead with some testing and polishing on that, and
it's close to ready, I think (CC'd you). I'd like that piece to be separate
and small enough to be backpatchable (at least in theory).
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Apr 29, 2023 at 1:17 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)If it helps, I've gone ahead with some testing and polishing on that, and it's close to ready, I think (CC'd you). I'd like that piece to be separate and small enough to be backpatchable (at least in theory).
That's great news. Not least because it unblocks this patch series of mine.
--
Peter Geoghegan
On Thu, Apr 27, 2023 at 12:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
[v2]
I've done a more careful read-through, but I'll need a couple more, I
imagine.
I'll first point out some things I appreciate, and I'm glad are taken care
of as part of this work:
- Pushing the talk of scheduled manual vacuums to the last, rather than
first, para in the intro
- No longer pretending that turning off autovacuum is somehow normal
- Removing the egregiously outdated practice of referring to VACUUM FULL as
a "variant" of VACUUM
- Removing the mention of ALTER TABLE that has no earthly business in this
chapter -- for that, rewriting the table is a side effect to try to avoid,
not a tool in our smorgasbord for removing severe bloat.
Some suggestions:
- The section "Recovering Disk Space" now has 5 tips/notes/warnings in a
row. This is good information, but I wonder about:
"Note: Although VACUUM FULL is technically an option of the VACUUM command,
VACUUM FULL uses a completely different implementation. VACUUM FULL is
essentially a variant of CLUSTER. (The name VACUUM FULL is historical; the
original implementation was somewhat closer to standard VACUUM.)"
...maybe move this to a second paragraph in the warning about VACUUM FULL
and CLUSTER?
- The sentence "The XID cutoff point that VACUUM uses..." reads a bit
abruptly and unmotivated (although it is important). Part of the reason for
this is that the hyperlink "transaction ID number (XID)" which points to
the glossary is further down the page than this first mention.
- "VACUUM often marks certain pages frozen, indicating that all eligible
rows on the page were inserted by a transaction that committed sufficiently
far in the past that the effects of the inserting transaction are certain
to be visible to all current and future transactions."
-> This sentence is much harder to understand than the one it replaces.
Also, this is the first time "eligible" is mentioned. It may not need a
separate definition, but in this form it's rather circular.
- "freezing plays a crucial role in enabling _management of the XID
address_ space by VACUUM"
-> "management of the XID address space" links to the
aggressive-strategy sub-section below, but it's a strange link title
because the section we're in is itself titled "Freezing to manage the
transaction ID space".
- "The maximum “distance” that the system can tolerate..."
-> The next sentence goes on to show the "age" function, so using
different terms is a bit strange. Mixing the established age term with an
in-quotes "distance" could perhaps be done once in a definition, but then
all uses should stick to age.
--
John Naylor
EDB: http://www.enterprisedb.com
On Sat, Apr 29, 2023 at 8:54 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
I've done a more careful read-through, but I'll need a couple more, I imagine.
Yeah, it's tough to get this stuff right.
I'll first point out some things I appreciate, and I'm glad are taken care of as part of this work:
- Pushing the talk of scheduled manual vacuums to the last, rather than first, para in the intro
- No longer pretending that turning off autovacuum is somehow normal
- Removing the egregiously outdated practice of referring to VACUUM FULL as a "variant" of VACUUM
- Removing the mention of ALTER TABLE that has no earthly business in this chapter -- for that, rewriting the table is a side effect to try to avoid, not a tool in our smorgasbord for removing severe bloat.Some suggestions:
- The section "Recovering Disk Space" now has 5 tips/notes/warnings in a row.
It occurs to me that all of this stuff (TRUNCATE, VACUUM FULL, and so
on) isn't "routine" at all. And so maybe this is the wrong chapter for
this entirely. The way I dealt with it in v2 wasn't very worked out --
I just knew that I had to do something, but hadn't given much thought
to what actually made sense.
I wonder if it would make sense to move all of that stuff into its own
new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along
the lines of "what to do about bloat when all else fails, when the
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming". What do you think of that general
approach?
This is good information, but I wonder about:
(Various points)
That's good feedback. I'll get to this in a couple of days.
--
Peter Geoghegan
On Wed, Apr 26, 2023 at 1:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)
To me, the fact that the XID space is circular is the whole point of
talking about wraparound. If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 8:03 AM Robert Haas <robertmhaas@gmail.com> wrote:
To me, the fact that the XID space is circular is the whole point of
talking about wraparound.
The word wraparound is ambiguous. It's not the same thing as
xidStopLimit in my view. It's literal integer wraparound.
If you think of XIDs as having a native 64-bit representation, while
using a truncated 32-bit on-disk representation in tuple headers
(which is the view promoted by the doc patch), then XIDs cannot wrap
around. There is still no possibility of "the future becoming the
past" (assuming no use of single user mode), either, because even in
the worst case we have xidStopLimit to make sure that the database
doesn't become corrupt. Why talk about what's *not* happening in a
place of prominence?
We'll still talk about literal integer wraparound with the doc patch,
but it's part of a discussion of the on-disk format in a distant
chapter. It's just an implementation detail, which is of no practical
consequence. The main discussion need only say something succinct and
vague about the use of a truncated representation (lacking a separate
epoch) in tuple headers eventually forcing freezing.
If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.
But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.
--
Peter Geoghegan
On Mon, May 1, 2023 at 12:01 PM Peter Geoghegan <pg@bowt.ie> wrote:
If the XID space were non-circular, it could
never try to reuse the XID values that have previously been used, and
this entire class of problems would go away. Because it is circular,
it's possible for the XID counter to arrive back at a place that it's
been before i.e. it can wrap around.But integer wrap around isn't really aligned with anything important.
xidStopLimit will kick in when we're only halfway towards literal
integer wrap around. Users have practical concerns about avoiding
xidStopLimit -- what a world without xidStopLimit looks like just
doesn't matter. Just having some vague awareness of truncated XIDs
being insufficient at some point is all you really need, even if
you're an advanced user.
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
I'm not trying to debate the details of the patch, which I have not
read. I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either. I
don't think it's accurate to imagine that this is a 64-bit counter
where we only store 32 bits on disk. We're trying to retcon that into
being true, but we'd have to work significantly harder to actually
make it true.
The purpose of this documentation section is to give users practical
guidance, obviously. The main reason to frame it this way is because
it seems to make the material easier to understand.
--
Peter Geoghegan
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around? It's
possible (and very likely) that every unfrozen XID in the database is
from the same 64-XID-wise epoch.
I don't think that we need to say very much about this scenario (and
nothing at all about the specifics in "Routine Vacuuming"), so maybe
it doesn't matter much. But I maintain that it makes most sense to
describe this scenario as a violation of the "no more than ~2.1
billion XIDs distance between any two unfrozen XIDs" invariant, while
leaving the term "wraparound" out of it completely. That terms has way
too much baggage.
--
Peter Geoghegan
On Mon, May 1, 2023, 18:08 Robert Haas <robertmhaas@gmail.com> wrote:
I am saying that, while wraparound is perhaps not a perfect term
for what's happening, it is not, in my opinion, a bad term either.
I don't want to put words into Peter's mouth, but I think that he's arguing
that the term "wraparound" suggests that there is something special about
the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
There's only something special about the transition, as your current xid
advances, between the xid that's half the xid space ahead of your current
xid and the xid that's half the xid space behind the current xid, if the
latter is not frozen. I don't think that's what most users think of when
they hear "wraparound".
On Mon, May 1, 2023 at 12:03 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
I don't want to put words into Peter's mouth, but I think that he's arguing that the term "wraparound" suggests that there is something special about the transition between xid 2^32 and xid 0 (or, well, 3). There isn't.
Yes, that's exactly what I mean. There are two points that seem to be
very much in tension here:
1. The scenario where you corrupt the database in single user mode by
unsafely allocating XIDs (you need single user mode to bypass the
xidStopLimit protections) generally won't involve unsigned integer
wraparound (and if it does it's *entirely* incidental to the data
corruption).
2. Actual unsigned integer wraparound is 100% harmless and routine, by design.
So why do we use the term wraparound as a synonym of "the end of the
world"? I assume that it's just an artefact of how the system worked
before the invention of freezing. Back then, you had to do a dump and
restore when the system reached about 4 billion XIDs. Wraparound
really did mean "the end of the world" over 20 years ago.
This is related to my preference for explaining the issues with
reference to a 64-bit XID space. Today we compare 64-bit XIDs using
simple unsigned integer comparisons. That's the same way that 32-bit
XID comparisons worked before freezing was invented in 2001. So it
really does seem like the natural way to explain it.
--
Peter Geoghegan
On Tue, May 2, 2023 at 12:09 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 1, 2023 at 9:08 AM Robert Haas <robertmhaas@gmail.com>
wrote:
I disagree. If you start the cluster in single-user mode, you can
actually wrap it around, unless something has changed that I don't
know about.
+1 Pretending otherwise is dishonest.
This patch relies on John's other patch which strongly discourages the
use of single-user mode. Were it not for that, I might agree.
Oh that's rich. I'll note that 5% of your review was actually helpful
(actual correction), the other 95% was needless distraction trying to
enlist me in your holy crusade against the term "wraparound". It had the
opposite effect.
Also, it's not clear that the term "wraparound" even describes what
happens when you corrupt the database by violating the "no more than
~2.1 billion XIDs distance between any two unfrozen XIDs" invariant in
single-user mode. What specific thing will have wrapped around?
In your first message you said "I'm hoping that I don't get too much push
back on this, because it's already very difficult work."
Here's some advice on how to avoid pushback:
1. Insist that all terms can only be interpreted in the most pig-headedly
literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
3. Claim that others are holding you back, and then try to move the
goalposts in their work.
--
John Naylor
EDB: http://www.enterprisedb.com
On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
Here's some advice on how to avoid pushback:
1. Insist that all terms can only be interpreted in the most pig-headedly literal sense possible.
2. Use that premise to pretend basic facts are a complete mystery.
I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.
How we use words like wraparound is actually something of great
consequence to the Postgres project. We've needlessly scared users
with the way this information has been presented up until now -- that
much is clear. To have you talk to me like this when I'm working on
such a difficult, thankless task is a real slap in the face.
3. Claim that others are holding you back, and then try to move the goalposts in their work.
When did I say that? When did I even suggest it?
--
Peter Geoghegan
On Mon, May 1, 2023 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote:
Oh that's rich. I'll note that 5% of your review was actually helpful (actual correction), the other 95% was needless distraction trying to enlist me in your holy crusade against the term "wraparound". It had the opposite effect.
I went back and checked. There were exactly two short paragraphs about
wraparound terminology on the thread associated with the patch you're
working on, towards the end of this one email:
/messages/by-id/CAH2-Wzm2fpPQ_=pXpRvkNiuTYBGTAUfxRNW40kLitxj9T3Ny7w@mail.gmail.com
In what world does that amount to 95% of my review, or anything like it?
--
Peter Geoghegan
On Mon, May 1, 2023 at 11:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
I can't imagine why you feel it necessary to communicate with me like
this. This is just vitriol, lacking any substance.
John's email is pretty harsh, but I can understand why he's frustrated.
I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making. You
seem to expect other people to spend a LOT OF TIME trying to
understand what you're trying to say, but you don't seem to invest
similar effort in trying to understand what they're trying to say. I
couldn't even begin to grasp what your point was until Maciek stepped
in to explain, and I still don't really agree with it, and I expect
that no matter how many emails I write about that, your position won't
budge an iota.
It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, May 2, 2023 at 1:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
I told you that I did not agree with your dislike for the term
wraparound and I explained why. You sent a couple more emails telling
me that I was wrong and, frankly, saying a lot of things that seem
only tangentially related to the point that I was actually making.
I agree that that's what I did. You're perfectly entitled to find that
annoying (though I maintain that my point about the 64-bit XID space
was a good one, assuming the general subject matter was of interest).
However, you're talking about this as if I dug my feet in on a
substantive issue affecting the basic shape of the patch -- I don't
believe that that conclusion is justified by anything I've said or
done. I'm not even sure that we disagree on some less important point
that will directly affect the patch (it's quite possible, but I'm not
even sure of it).
I've already said that I don't think that the term wraparound is going
anywhere anytime soon (granted, that was on the other thread). So it's
not like I'm attempting to banish all existing use of that terminology
within the scope of this patch series -- far from it. At most I tried
to avoid inventing new terms that contain the word "wraparound" (also
on the other thread).
The topic originally came up in the context of moving talk about
physical wraparound to an entirely different chapter. Which is, I
believe (based in part on previous discussions), something that all
three of us already agree on! So again, I must ask: is there actually
a substantive disagreement at all?
It's really demoralizing. If I just vote -1 on the patch set, then I'm
a useless obstruction. If I actually try to review it, we'll exchange
100 emails and I won't get anything else done for the next two weeks
and I probably won't feel much better about the patch at the end of
that process than at the beginning. I don't see that I have any
winning options here.
I've already put a huge amount of work into this. It is inherently a
very difficult thing to get right -- it's not hard to understand why
it was put off for so long. Why shouldn't I have opinions, given all
that? I'm frustrated too.
Despite all this, John basically agreed with my high level direction
-- all of the important points seemed to have been settled without any
arguments whatsoever (also based in part on previous discussions).
John's volley of abuse seemed to come from nowhere at all.
--
Peter Geoghegan
Hi,
On Mon, Apr 24, 2023 at 2:58 PM Peter Geoghegan <pg@bowt.ie> wrote:
My work on page-level freezing for PostgreSQL 16 has some remaining
loose ends to tie up with the documentation. The "Routine Vacuuming"
section of the docs has no mention of page-level freezing. It also
doesn't mention the FPI optimization added by commit 1de58df4. This
isn't a small thing to leave out; I fully expect that the FPI
optimization will very significantly alter when and how VACUUM
freezes. The cadence will look quite a lot different.It seemed almost impossible to fit in discussion of page-level
freezing to the existing structure. In part this is because the
existing documentation emphasizes the worst case scenario, rather than
talking about freezing as a maintenance task that affects physical
heap pages in roughly the same way as pruning does. There isn't a
clean separation of things that would allow me to just add a paragraph
about the FPI thing.Obviously it's important that the system never enters xidStopLimit
mode -- not being able to allocate new XIDs is a huge problem. But it
seems unhelpful to define that as the only goal of freezing, or even
the main goal. To me this seems similar to defining the goal of
cleaning up bloat as avoiding completely running out of disk space;
while it may be "the single most important thing" in some general
sense, it isn't all that important in most individual cases. There are
many very bad things that will happen before that extreme worst case
is hit, which are far more likely to be the real source of pain.There are also very big structural problems with "Routine Vacuuming",
that I also propose to do something about. Honestly, it's a huge mess
at this point. It's nobody's fault in particular; there has been
accretion after accretion added, over many years. It is time to
finally bite the bullet and do some serious restructuring. I'm hoping
that I don't get too much push back on this, because it's already very
difficult work.
Thanks for taking the time to do this. It is indeed difficult work. I'll
give my perspective as someone who has not read the vacuum code but have
learnt most of what I know about autovacuum / vacuuming by reading the
"Routine Vacuuming" page 10s of times.
Attached patch series shows what I consider to be a much better
overall structure. To make this convenient to take a quick look at, I
also attach a prebuilt version of routine-vacuuming.html (not the only
page that I've changed, but the most important set of changes by far).This initial version is still quite lacking in overall polish, but I
believe that it gets the general structure right. That's what I'd like
to get feedback on right now: can I get agreement with me about the
general nature of the problem? Does this high level direction seem
like the right one?
There are things I like about the changes you've proposed and some where I
feel that the previous section was easier to understand. I'll comment
inline on the summary below and will put in a few points about things I
think can be improved at the end.
The following list is a summary of the major changes that I propose:
1. Restructures the order of items to match the actual processing
order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
ANALYZE and then back to VACUUM.This flows a lot better, which helps with later items that deal with
freezing/wraparound.
+1
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).
+1 on this too. Freezing is a normal part of vacuuming and while the
aggressive vacuums are different, I think just talking about the worst case
scenario while referring to it is alarmist.
3. All of the stuff about modulo-2^32 arithmetic is moved to the
storage chapter, where we describe the heap tuple header format.It seems crazy to me that the second sentence in our discussion of
wraparound/freezing is still:"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future"Here we start the whole discussion of wraparound (a particularly
delicate topic) by describing how VACUUM used to work 20 years ago,
before the invention of freezing. That was the last time that a
PostgreSQL cluster could run for 4 billion XIDs without freezing. The
invariant is that we activate xidStopLimit mode protections to avoid a
"distance" between any two unfrozen XIDs that exceeds about 2 billion
XIDs. So why on earth are we talking about 4 billion XIDs? This is the
most confusing, least useful way of describing freezing that I can
think of.4. No more separate section for MultiXactID freezing -- that's
discussed as part of the discussion of page-level freezing.Page-level freezing takes place without regard to the trigger
condition for freezing. So the new approach to freezing has a fixed
idea of what it means to freeze a given page (what physical
modifications it entails). This means that having a separate sect3
subsection for MultiXactIds now makes no sense (if it ever did).5. The top-level list of maintenance tasks has a new addition: "To
truncate obsolescent transaction status information, when possible".It makes a lot of sense to talk about this as something that happens
last (or last among those steps that take place during VACUUM). It's
far less important than avoiding xidStopLimit outages, obviously
(using some extra disk space is almost certainly the least of your
worries when you're near to xidStopLimit). The current documentation
seems to take precisely the opposite view, when it says the following:"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more
space"This sentence is dangerously bad advice. It is precisely backwards. At
the same time, we'd better say something about the need to truncate
pg_xact/clog here. Besides all this, the new section for this is a far
more accurate reflection of what's really going on: most individual
VACUUMs (even most aggressive VACUUMs) won't ever truncate
pg_xact/clog (or the other relevant SLRUs). Truncation only happens
after a VACUUM that advances the relfrozenxid of the table which
previously had the oldest relfrozenxid among all tables in the entire
cluster -- so we need to talk about it as an issue with the high
watermark storage for pg_xact.6. Rename the whole "Routine Vacuuming" section to "Autovacuum
Maintenance Tasks".This is what we should be emphasizing over manually run VACUUMs.
Besides, the current title just seems wrong -- we're talking about
ANALYZE just as much as VACUUM.
+1 on this. Talking about autovacuum as the default and how to get the most
out of it seems like the right way to go.
I read through the new version a couple times and here is some of my
feedback. I haven't yet reviewed individual patches or done a very detailed
comparison with the previous version.
1) While I agree that bundling VACUUM and VACUUM FULL is not the right way,
moving all VACUUM FULL references into tips and warnings also seems
excessive. I think it's probably best to just have a single paragraph which
talks about VACUUM FULL as I do think it should be mentioned in the
reclaiming disk space section.
2) I felt that the new section, "Freezing to manage the transaction ID
space" could be made simpler to understand. As an example, I understood
what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do
and how they interact better in the previous version of the docs.
3) In the "VACUUMs aggressive strategy" section, we should first introduce
what an aggressive VACUUM is before going into when it's triggered, where
metadata is stored etc. It's only several paragraphs later that I get to
know what we are referring to as an "aggressive" autovacuum.
4) I think we should explicitly call out that seeing an anti-wraparound
VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's
just a VACUUM triggered due to the table having unfrozen rows with an XID
older than autovacuum_freeze_max_age. I've seen many users panicking on
seeing this and feeling that they are close to a wraparound. Also, we
should be more clear about how it's different from VACUUMs triggered due to
the scale factors (cancellation behavior, being triggered when autovacuum
is disabled etc.). I think you do some of this but given the panic around
transactionid wraparounds, being more clear about this is better.
5) Can we use a better name for the XidStopLimit mode? It seems like a very
implementation centric name. Maybe a better version of "Running out of the
XID space" or something like that?
6) In the XidStopLimit mode section, it would be good to explain briefly
why you could get to this scenario. It's not something which should happen
in a normal running system unless you have a long running transaction or
inactive replication slots or a badly configured system or something of
that sort. If you got to this point, other than running VACUUM to get out
of the situation, it's also important to figure out what got you there in
the first place as many VACUUMs should have attempted to advance the
relfrozenxid and failed.
There are a few other small things I noticed along the way but my goal was
to look at the overall structure. As we address some of these, I'm happy to
do more detailed review of individual patches.
Regards,
Samay
Thoughts?
Show quoted text
--
Peter Geoghegan
Hi Samay,
On Tue, May 2, 2023 at 11:40 PM samay sharma <smilingsamay@gmail.com> wrote:
Thanks for taking the time to do this. It is indeed difficult work.
Thanks for the review! I think that this is something that would
definitely benefit from a perspective such as yours.
There are things I like about the changes you've proposed and some where I feel that the previous section was easier to understand.
That makes sense, and I think that I agree with every point you've
raised, bar none. I'm pleased to see that you basically agree with the
high level direction.
I would estimate that the version you looked at (v2) is perhaps 35%
complete. So some of the individual problems you noticed were a direct
consequence of the work just not being anywhere near complete. I'll
try to do a better job of tracking the relative maturity of each
commit/patch in each commit message, going forward.
Anything that falls under "25.2.1. Recovering Disk Space" is
particularly undeveloped in v2. The way that I broke that up into a
bunch of WARNINGs/NOTEs/TIPs was just a short term way of breaking it
up into pieces, so that the structure was very approximately what I
wanted. I actually think that the stuff about CLUSTER and VACUUM FULL
belongs in a completely different chapter. Since it is not "Routine
Vacuuming" at all.
2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).+1 on this too. Freezing is a normal part of vacuuming and while the aggressive vacuums are different, I think just talking about the worst case scenario while referring to it is alarmist.
Strangely enough, Postgres 16 is the first version that instruments
freezing in its autovacuum log reports. I suspect that some long term
users will find it quite surprising to see how much (or how little)
freezing takes place in non-aggressive VACUUMs.
The introduction of page-level freezing will make it easier and more
natural to tune settings like vacuum_freeze_min_age, with the aim of
smoothing out the burden of freezing over time (particularly by making
non-aggressive VACUUMs freeze more). Page-level freezing removes any
question of not freezing every tuple on a page (barring cases where
"removable cutoff" is noticeably held back by an old MVCC snapshot).
This makes it more natural to think of freezing as a process that
makes it okay to store data in individual physical heap pages, long
term.
1) While I agree that bundling VACUUM and VACUUM FULL is not the right way, moving all VACUUM FULL references into tips and warnings also seems excessive. I think it's probably best to just have a single paragraph which talks about VACUUM FULL as I do think it should be mentioned in the reclaiming disk space section.
As I mentioned briefly already, my intention is to move it to another
chapter entirely. I was thinking of "Chapter 29. Monitoring Disk
Usage". The "Routine Vacuuming" docs would then link to this sect1 --
something along the lines of "non-routine commands to reclaim a lot of
disk space in the event of extreme bloat".
2) I felt that the new section, "Freezing to manage the transaction ID space" could be made simpler to understand. As an example, I understood what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do and how they interact better in the previous version of the docs.
Agreed. I'm going to split it up some more. I think that the current
"25.2.2.1. VACUUM's Aggressive Strategy" should be split in two, so we
go from talking about aggressive VACUUMs to Antiwraparound
autovacuums. Finding the least confusing way of explaining it has been
a focus of mine in the last few days.
4) I think we should explicitly call out that seeing an anti-wraparound VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's just a VACUUM triggered due to the table having unfrozen rows with an XID older than autovacuum_freeze_max_age. I've seen many users panicking on seeing this and feeling that they are close to a wraparound.
That has also been my exact experience. Users are terrified, usually
for no good reason at all. I'll make sure that this comes across in
the next revision of the patch series.
Also, we should be more clear about how it's different from VACUUMs triggered due to the scale factors (cancellation behavior, being triggered when autovacuum is disabled etc.).
Right. Though I think that the biggest point of confusion for users is
how *few* differences there really are between antiwraparound
autovacuum, and any other kind of autovacuum that happens to use
VACUUM's aggressive strategy. There is really only one important
difference: the autocancellation behavior. This is an autovacuum
behavior, not a VACUUM behavior -- so the "VACUUM side" doesn't know
anything about that at all.
5) Can we use a better name for the XidStopLimit mode? It seems like a very implementation centric name. Maybe a better version of "Running out of the XID space" or something like that?
Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.
6) In the XidStopLimit mode section, it would be good to explain briefly why you could get to this scenario. It's not something which should happen in a normal running system unless you have a long running transaction or inactive replication slots or a badly configured system or something of that sort.
I agree that that's important. Note that there is already something
about "removable cutoff" being held back at the start of the
discussion of freezing -- that will prevent freezing in exactly the
same way as it prevents cleanup of dead tuples.
That will become a WARNING box in the next revision. There should also
be a similar, analogous WARNING box (about "removable cutoff" being
held back) much earlier on in the docs -- this should appear in
"25.2.1. Recovering Disk Space". Obviously this structure suggests
that there is an isomorphism between freezing and removing bloat. For
example, if you cannot "freeze" an XID that appears in some tuple's
xmax, then you also cannot remove that tuple because VACUUM only sees
it as a recently dead tuple (if xmax is >= OldestXmin/removable
cutoff, and from a deleter that already committed).
I don't think that we need to spell the "isomorphism" point out to the
reader directly, but having a subtle cue that that's how it works
seems like a good idea.
If you got to this point, other than running VACUUM to get out of the situation, it's also important to figure out what got you there in the first place as many VACUUMs should have attempted to advance the relfrozenxid and failed.
It's also true that problems that can lead to the system entering
xidStopLimit mode aren't limited to cases where doing required
freezing is fundamentally impossible due to something holding back
"removable cutoff". It's also possible that VACUUM simply can't keep
up (though the failsafe has helped with that problem a lot).
I tend to agree that there needs to be more about this in the
xidStopLimit subsection (discussion of freezing being held back by
"removable cutoff" is insufficient), but FWIW that seems like it
should probably be treated as out of scope for this patch. It is more
the responsibility of the other patch [1]/messages/by-id/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com -- Peter Geoghegan that aims to put the
xidStopLimit documentation on a better footing (and remove that
terrible HINT about single user mode).
Of course, that other patch is closely related to this patch -- the
precise boundaries are unclear at this point. In any case I think that
this should happen, because I think that it's a good idea.
There are a few other small things I noticed along the way but my goal was to look at the overall structure.
Thanks again! This is very helpful.
[1]: /messages/by-id/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.
What about "XID allocation overload"? The implication that I'm going
for here is that the system was misconfigured, or there was otherwise
some kind of imbalance between XID supply and demand. It also seems to
convey the true gravity of the situation -- it's *bad*, to be sure,
but in many environments it's a survivable condition.
One possible downside of this name is that it could suggest that all
that needs to happen is for autovacuum to catch up on vacuuming. In
reality the user *will* probably have to do more than just wait before
the system's ability to allocate new XIDs returns, because (in all
likelihood) autovacuum just won't be able to catch up unless and until
the user (say) drops a replication slot. Even still, the name seems to
work; it describes the conceptual model of the system accurately. Even
before the user drops the replication slot, autovacuum will at least
*try* to get the system back to being able to allocate new XIDs once
more.
--
Peter Geoghegan
Hi,
On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
Hi Samay,
On Tue, May 2, 2023 at 11:40 PM samay sharma <smilingsamay@gmail.com>
wrote:Thanks for taking the time to do this. It is indeed difficult work.
Thanks for the review! I think that this is something that would
definitely benefit from a perspective such as yours.
Glad to hear that my feedback was helpful.
There are things I like about the changes you've proposed and some where
I feel that the previous section was easier to understand.
That makes sense, and I think that I agree with every point you've
raised, bar none. I'm pleased to see that you basically agree with the
high level direction.I would estimate that the version you looked at (v2) is perhaps 35%
complete. So some of the individual problems you noticed were a direct
consequence of the work just not being anywhere near complete. I'll
try to do a better job of tracking the relative maturity of each
commit/patch in each commit message, going forward.Anything that falls under "25.2.1. Recovering Disk Space" is
particularly undeveloped in v2. The way that I broke that up into a
bunch of WARNINGs/NOTEs/TIPs was just a short term way of breaking it
up into pieces, so that the structure was very approximately what I
wanted. I actually think that the stuff about CLUSTER and VACUUM FULL
belongs in a completely different chapter. Since it is not "Routine
Vacuuming" at all.2. Renamed "Preventing Transaction ID Wraparound Failures" to
"Freezing to manage the transaction ID space". Now we talk about
wraparound as a subtopic of freezing, not vice-versa. (This is a
complete rewrite, as described by later items in this list).+1 on this too. Freezing is a normal part of vacuuming and while the
aggressive vacuums are different, I think just talking about the worst case
scenario while referring to it is alarmist.Strangely enough, Postgres 16 is the first version that instruments
freezing in its autovacuum log reports. I suspect that some long term
users will find it quite surprising to see how much (or how little)
freezing takes place in non-aggressive VACUUMs.The introduction of page-level freezing will make it easier and more
natural to tune settings like vacuum_freeze_min_age, with the aim of
smoothing out the burden of freezing over time (particularly by making
non-aggressive VACUUMs freeze more). Page-level freezing removes any
question of not freezing every tuple on a page (barring cases where
"removable cutoff" is noticeably held back by an old MVCC snapshot).
This makes it more natural to think of freezing as a process that
makes it okay to store data in individual physical heap pages, long
term.1) While I agree that bundling VACUUM and VACUUM FULL is not the right
way, moving all VACUUM FULL references into tips and warnings also seems
excessive. I think it's probably best to just have a single paragraph which
talks about VACUUM FULL as I do think it should be mentioned in the
reclaiming disk space section.As I mentioned briefly already, my intention is to move it to another
chapter entirely. I was thinking of "Chapter 29. Monitoring Disk
Usage". The "Routine Vacuuming" docs would then link to this sect1 --
something along the lines of "non-routine commands to reclaim a lot of
disk space in the event of extreme bloat".2) I felt that the new section, "Freezing to manage the transaction ID
space" could be made simpler to understand. As an example, I understood
what the parameters (autovacuum_freeze_max_age, vacuum_freeze_table_age) do
and how they interact better in the previous version of the docs.Agreed. I'm going to split it up some more. I think that the current
"25.2.2.1. VACUUM's Aggressive Strategy" should be split in two, so we
go from talking about aggressive VACUUMs to Antiwraparound
autovacuums. Finding the least confusing way of explaining it has been
a focus of mine in the last few days.
To be honest, this was not super simple to understand even in the previous
version. However, as our goal is to simplify this and make it easier to
understand, I'll hold this patch-set to a higher standard :).
I wish there was a simple representation (maybe even a table or something)
which would explain the differences between a VACUUM which is not
aggressive, a VACUUM which ends up being aggressive due to
vacuum_freeze_table_age and an antiwraparound autovacuum.
4) I think we should explicitly call out that seeing an anti-wraparound
VACUUM or "VACUUM table (to prevent wraparound)" is normal and that it's
just a VACUUM triggered due to the table having unfrozen rows with an XID
older than autovacuum_freeze_max_age. I've seen many users panicking on
seeing this and feeling that they are close to a wraparound.That has also been my exact experience. Users are terrified, usually
for no good reason at all. I'll make sure that this comes across in
the next revision of the patch series.
Thinking about it a bit more, I wonder if there's value in changing the
"(to prevent wraparound)" to something else. It's understandable why people
who just see that in pg_stat_activity and don't read docs might assume they
are close to a wraparound.
Regards,
Samay
Show quoted text
Also, we should be more clear about how it's different from VACUUMs
triggered due to the scale factors (cancellation behavior, being triggered
when autovacuum is disabled etc.).Right. Though I think that the biggest point of confusion for users is
how *few* differences there really are between antiwraparound
autovacuum, and any other kind of autovacuum that happens to use
VACUUM's aggressive strategy. There is really only one important
difference: the autocancellation behavior. This is an autovacuum
behavior, not a VACUUM behavior -- so the "VACUUM side" doesn't know
anything about that at all.
5) Can we use a better name for the XidStopLimit mode? It seems like a
very implementation centric name. Maybe a better version of "Running out of
the XID space" or something like that?Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.6) In the XidStopLimit mode section, it would be good to explain briefly
why you could get to this scenario. It's not something which should happen
in a normal running system unless you have a long running transaction or
inactive replication slots or a badly configured system or something of
that sort.I agree that that's important. Note that there is already something
about "removable cutoff" being held back at the start of the
discussion of freezing -- that will prevent freezing in exactly the
same way as it prevents cleanup of dead tuples.That will become a WARNING box in the next revision. There should also
be a similar, analogous WARNING box (about "removable cutoff" being
held back) much earlier on in the docs -- this should appear in
"25.2.1. Recovering Disk Space". Obviously this structure suggests
that there is an isomorphism between freezing and removing bloat. For
example, if you cannot "freeze" an XID that appears in some tuple's
xmax, then you also cannot remove that tuple because VACUUM only sees
it as a recently dead tuple (if xmax is >= OldestXmin/removable
cutoff, and from a deleter that already committed).I don't think that we need to spell the "isomorphism" point out to the
reader directly, but having a subtle cue that that's how it works
seems like a good idea.If you got to this point, other than running VACUUM to get out of the
situation, it's also important to figure out what got you there in the
first place as many VACUUMs should have attempted to advance the
relfrozenxid and failed.It's also true that problems that can lead to the system entering
xidStopLimit mode aren't limited to cases where doing required
freezing is fundamentally impossible due to something holding back
"removable cutoff". It's also possible that VACUUM simply can't keep
up (though the failsafe has helped with that problem a lot).I tend to agree that there needs to be more about this in the
xidStopLimit subsection (discussion of freezing being held back by
"removable cutoff" is insufficient), but FWIW that seems like it
should probably be treated as out of scope for this patch. It is more
the responsibility of the other patch [1] that aims to put the
xidStopLimit documentation on a better footing (and remove that
terrible HINT about single user mode).Of course, that other patch is closely related to this patch -- the
precise boundaries are unclear at this point. In any case I think that
this should happen, because I think that it's a good idea.There are a few other small things I noticed along the way but my goal
was to look at the overall structure.
Thanks again! This is very helpful.
[1]
/messages/by-id/CAJ7c6TM2D277U2wH8X78kg8pH3tdUqebV3_JCJqAkYQFHCFzeg@mail.gmail.com
--
Peter Geoghegan
Hi,
On Wed, May 3, 2023 at 3:48 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, May 3, 2023 at 2:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
Coming up with a new user-facing name for xidStopLimit is already on
my TODO list (it's surprisingly hard). I have used that name so far
because it unambiguously refers to the exact thing that I want to talk
about when discussing the worst case. Other than that, it's a terrible
name.What about "XID allocation overload"? The implication that I'm going
for here is that the system was misconfigured, or there was otherwise
some kind of imbalance between XID supply and demand. It also seems to
convey the true gravity of the situation -- it's *bad*, to be sure,
but in many environments it's a survivable condition.
My concern with the term "overload" is similar to what you expressed below.
It indicates that the situation is due to extra load on the system (or due
to too many XIDs being allocated) and people might assume that the
situation will resolve itself if the load were to be reduced / removed.
However, it's due to that along with some misconfiguration or some other
thing holding back the "removable cutoff".
What do you think about the term "Exhaustion"? Maybe something like "XID
allocation exhaustion" or "Exhaustion of allocatable XIDs"? The term
indicates that we are running out of XIDs to allocate without necessarily
pointing towards a reason.
Regards,
Samay
Show quoted text
One possible downside of this name is that it could suggest that all
that needs to happen is for autovacuum to catch up on vacuuming. In
reality the user *will* probably have to do more than just wait before
the system's ability to allocate new XIDs returns, because (in all
likelihood) autovacuum just won't be able to catch up unless and until
the user (say) drops a replication slot. Even still, the name seems to
work; it describes the conceptual model of the system accurately. Even
before the user drops the replication slot, autovacuum will at least
*try* to get the system back to being able to allocate new XIDs once
more.--
Peter Geoghegan
On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
What do you think about the term "Exhaustion"?
I'm really not sure.
Attached is v3, which (as with v1 and v2) comes with a prebuilt html
"Routine Vacuuming", for the convenience of reviewers.
v3 does have some changes based on your feedback (and feedback from
John), but overall v3 can be thought of as v2 with lots and lots of
additional copy-editing -- though still not enough, I'm sure.
v3 does add some (still incomplete) introductory remarks about the
intended audience and goals for "Routine Vacuuming". But most of the
changes are to the way the docs describe freezing and aggressive
VACUUM, which continued to be my focus for v3.
--
Peter Geoghegan
Attachments:
v3-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchapplication/octet-stream; name=v3-0007-Make-maintenance.sgml-more-autovacuum-orientated.patchDownload
From b1bba6df919d98545ee4d49f6bb8b4892ed55b27 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:11:10 -0700
Subject: [PATCH v3 7/9] Make maintenance.sgml more autovacuum-orientated.
Now that it's no longer in its own sect2, shorten the "Vacuuming basics"
content, and make it more autovacuum-orientated. This gives much less
prominence to VACUUM FULL, which has little place in a section about
autovacuum. We no longer define avoiding the need to run VACUUM FULL as
the purpose of vacuuming.
A later commit that overhauls "Recovering Disk Space" will add back a
passing mention of things like VACUUM FULL and TRUNCATE, but only as
something that might be relevant in extreme cases. (Use of these
commands is hopefully neither "Routine" nor "Basic" to most users).
Also add some introductory information about the audience and goals of
the "Routine Vacuuming" section of the docs.
---
doc/src/sgml/maintenance.sgml | 122 +++++++++++++++++++++-------------
1 file changed, 76 insertions(+), 46 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index a05e880fc..36f481aba 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -32,11 +32,12 @@
</para>
<para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
+ The other main category of maintenance task is periodic
+ <quote><link linkend="routine-vacuuming">vacuuming</link></quote> of
+ the database by autovacuum. Configuring autovacuum scheduling is
+ discussed in <xref linkend="autovacuum"/>. Autovacuum also updates
+ the statistics that will be used by the query planner, as discussed
+ in <xref linkend="vacuum-for-statistics"/>.
</para>
<para>
@@ -244,7 +245,7 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</sect1>
<sect1 id="routine-vacuuming">
- <title>Routine Vacuuming</title>
+ <title>Autovacuum Maintenance Tasks</title>
<indexterm zone="routine-vacuuming">
<primary>vacuum</primary>
@@ -252,24 +253,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<para>
<productname>PostgreSQL</productname> databases require periodic
- maintenance known as <firstterm>vacuuming</firstterm>. For many installations, it
- is sufficient to let vacuuming be performed by the <firstterm>autovacuum
- daemon</firstterm>, which is described in <xref linkend="autovacuum"/>. You might
- need to adjust the autovacuuming parameters described there to obtain best
- results for your situation. Some database administrators will want to
- supplement or replace the daemon's activities with manually-managed
- <command>VACUUM</command> commands, which typically are executed according to a
- schedule by <application>cron</application> or <application>Task
- Scheduler</application> scripts. To set up manually-managed vacuuming properly,
- it is essential to understand the issues discussed in the next few
- subsections. Administrators who rely on autovacuuming may still wish
- to skim this material to help them understand and adjust autovacuuming.
+ maintenance known as <firstterm>vacuuming</firstterm>, and require
+ periodic updates to the statistics used by the
+ <productname>PostgreSQL</productname> query planner. The <link
+ linkend="sql-vacuum"><command>VACUUM</command></link> and <link
+ linkend="sql-analyze"><command>ANALYZE</command></link> commands
+ perform these maintenance tasks. The <firstterm>autovacuum
+ daemon</firstterm> automatically schedules execution of
+ maintenance, based on the requirements of the workload.
</para>
-
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ The autovacuum daemon has to process each table regularly for
+ several reasons:
<orderedlist>
<listitem>
@@ -295,34 +290,69 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
</orderedlist>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
+ Maintenance work within the scope of items 1, 2, 3, and 4 is
+ performed by the <command>VACUUM</command> command internally. The
+ <command>ANALYZE</command> command handles maintenance work within
+ the scope of item 5 (maintenance of planner statistics) internally.
</para>
-
<para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
+ Generally speaking, database administrators that are new to tuning
+ autovacuum should start by considering adjusting autovacuum's
+ scheduling. Autovacuum scheduling is controlled via threshold
+ settings. These settings determine when autovacuum should launch a
+ worker to run <command>VACUUM</command> and/or
+ <command>ANALYZE</command>; see the previous section, <xref
+ linkend="autovacuum"/>. This section provides additional
+ information about the design and goals of autovacuum,
+ <command>VACUUM</command>, and <command>ANALYZE</command>. The
+ intended audience is database administrators that wish to perform
+ more advanced tuning of autovacuum, with any of the following goals
+ in mind:
</para>
-
+ <itemizedlist>
+ <listitem>
+ <para>
+ Tuning <command>VACUUM</command> to improve query response times.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Making sure that <command>VACUUM</command>'s management of the
+ Transaction ID address space is operating normally.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Tuning <command>VACUUM</command> for performance stability.
+ </para>
+ </listitem>
+ </itemizedlist>
<para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ With larger installations, tuning autovacuum usually won't be a
+ once-off task; it is best to approach tuning as an iterative,
+ applied process. FIXME Expand this to describe the intended
+ audience on goals in a fully worked out way.
+ </para>
+ <para>
+ Autovacuum creates a substantial amount of I/O traffic, which can
+ cause poor performance for other active sessions. There are
+ configuration parameters that you can adjust to reduce the
+ performance impact of background vacuuming. See the
+ autovacuum-specific cost delay settings described in <xref
+ linkend="runtime-config-autovacuum"/>, and additional cost delay
+ settings described in <xref
+ linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
+ <para>
+ Some database administrators will want to supplement the daemon's
+ activities with manually-managed <command>VACUUM</command>
+ commands. Scripting tools like <application>cron</application> and
+ <application>Task Manager</application> can be of help with this.
+ It can be useful to perform off-hours <command>VACUUM</command>
+ commands during periods where reduced load is expected. Almost all
+ of the contents of this section apply equally to manually-issued
+ <command>VACUUM</command> and <command>ANALYZE</command>
+ operations.
</para>
<sect2 id="vacuum-for-space-recovery">
--
2.40.1
v3-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v3-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload
From 0870911aab7bc3a7bff71e93d8d355b0e48640b7 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 12 Apr 2023 14:42:06 -0700
Subject: [PATCH v3 1/9] Make autovacuum docs into a sect1 of its own.
This doesn't change any of the content itself. Though it does move it
from the end of "Routine Vacuuming" (which is itself a sect1) to a whole
new sect1 that appears _before_ "Routine Vacuuming".
XXX Open question: does it make more sense to move the sect1 to before
"Routine Vacuuming", or should it go after instead? There are arguments
for both.
Arguments for "before":
"Before" gives greater prominence to the autovacuum scheduling tunables,
such as autovacuum_vacuum_scale_factor. These are the most important
individual tunables, which argues for putting them earlier than "Routine
Vacuuming".
Arguments for "after":
Although the discussion in "Routine Vacuuming" is rather involved, it is
arguably still introductory material that informs how the user will tune
autovacuum_vacuum_scale_factor. (Assuming that the user doesn't just go
by trial and error, which seems more likely in practice but not
necessarily the most useful working assumption for our purposes.)
---
doc/src/sgml/maintenance.sgml | 332 +++++++++++++++++-----------------
1 file changed, 166 insertions(+), 166 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a..a6295c399 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -59,6 +59,172 @@
pleasant and productive experience with the system.
</para>
+ <sect1 id="autovacuum">
+ <title>The Autovacuum Daemon</title>
+
+ <indexterm>
+ <primary>autovacuum</primary>
+ <secondary>general information</secondary>
+ </indexterm>
+ <para>
+ <productname>PostgreSQL</productname> has an optional but highly
+ recommended feature called <firstterm>autovacuum</firstterm>,
+ whose purpose is to automate the execution of
+ <command>VACUUM</command> and <command>ANALYZE</command> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <xref
+ linkend="guc-track-counts"/> is set to <literal>true</literal>.
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </para>
+
+ <para>
+ The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+ <firstterm>autovacuum worker</firstterm> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+ seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
+ a new worker will be launched every
+ <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+ A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+ are allowed to run at the same time. If there are more than
+ <varname>autovacuum_max_workers</varname> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+ <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+ autovacuum workers' activity.
+ </para>
+
+ <para>
+ If several large tables all become eligible for vacuuming in a short
+ amount of time, all autovacuum workers might become occupied with
+ vacuuming those tables for a long period. This would result
+ in other tables and databases not being vacuumed until a worker becomes
+ available. There is no limit on how many workers might be in a
+ single database, but workers do try to avoid repeating work that has
+ already been done by other workers. Note that the number of running
+ workers does not count towards <xref linkend="guc-max-connections"/> or
+ <xref linkend="guc-superuser-reserved-connections"/> limits.
+ </para>
+
+ <para>
+ Tables whose <structfield>relfrozenxid</structfield> value is more than
+ <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+ table is vacuumed. The vacuum threshold is defined as:
+<programlisting>
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</programlisting>
+ where the vacuum base threshold is
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ the vacuum scale factor is
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+ and the number of tuples is
+ <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+ </para>
+
+ <para>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
+<programlisting>
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</programlisting>
+ where the vacuum insert base threshold is
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+ and vacuum insert scale factor is
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+ Such vacuums may allow portions of the table to be marked as
+ <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <command>INSERT</command> operations but no or
+ almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+ it may be beneficial to lower the table's
+ <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the cumulative statistics system;
+ it is a semi-accurate count updated by each <command>UPDATE</command>,
+ <command>DELETE</command> and <command>INSERT</command> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <structfield>relfrozenxid</structfield> value of the table
+ is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </para>
+
+ <para>
+ For analyze, a similar condition is used: the threshold, defined as:
+<programlisting>
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</programlisting>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <command>ANALYZE</command>.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+
+ <para>
+ The default thresholds and scale factors are taken from
+ <filename>postgresql.conf</filename>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <xref linkend="sql-createtable-storage-parameters"/> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+ the global settings.
+ </para>
+
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+
+ <para>
+ Autovacuum workers generally don't block other commands. If a process
+ attempts to acquire a lock that conflicts with the
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+ acquisition will interrupt the autovacuum. For conflicting lock modes,
+ see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
+ is running to prevent transaction ID wraparound (i.e., the autovacuum query
+ name in the <structname>pg_stat_activity</structname> view ends with
+ <literal>(to prevent wraparound)</literal>), the autovacuum is not
+ automatically interrupted.
+ </para>
+
+ <warning>
+ <para>
+ Regularly running commands that acquire locks conflicting with a
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+ effectively prevent autovacuums from ever completing.
+ </para>
+ </warning>
+ </sect1>
+
<sect1 id="routine-vacuuming">
<title>Routine Vacuuming</title>
@@ -749,172 +915,6 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
-
- <sect2 id="autovacuum">
- <title>The Autovacuum Daemon</title>
-
- <indexterm>
- <primary>autovacuum</primary>
- <secondary>general information</secondary>
- </indexterm>
- <para>
- <productname>PostgreSQL</productname> has an optional but highly
- recommended feature called <firstterm>autovacuum</firstterm>,
- whose purpose is to automate the execution of
- <command>VACUUM</command> and <command>ANALYZE</command> commands.
- When enabled, autovacuum checks for
- tables that have had a large number of inserted, updated or deleted
- tuples. These checks use the statistics collection facility;
- therefore, autovacuum cannot be used unless <xref
- linkend="guc-track-counts"/> is set to <literal>true</literal>.
- In the default configuration, autovacuuming is enabled and the related
- configuration parameters are appropriately set.
- </para>
-
- <para>
- The <quote>autovacuum daemon</quote> actually consists of multiple processes.
- There is a persistent daemon process, called the
- <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
- <firstterm>autovacuum worker</firstterm> processes for all databases. The
- launcher will distribute the work across time, attempting to start one
- worker within each database every <xref linkend="guc-autovacuum-naptime"/>
- seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
- a new worker will be launched every
- <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
- A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
- are allowed to run at the same time. If there are more than
- <varname>autovacuum_max_workers</varname> databases to be processed,
- the next database will be processed as soon as the first worker finishes.
- Each worker process will check each table within its database and
- execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
- <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
- autovacuum workers' activity.
- </para>
-
- <para>
- If several large tables all become eligible for vacuuming in a short
- amount of time, all autovacuum workers might become occupied with
- vacuuming those tables for a long period. This would result
- in other tables and databases not being vacuumed until a worker becomes
- available. There is no limit on how many workers might be in a
- single database, but workers do try to avoid repeating work that has
- already been done by other workers. Note that the number of running
- workers does not count towards <xref linkend="guc-max-connections"/> or
- <xref linkend="guc-superuser-reserved-connections"/> limits.
- </para>
-
- <para>
- Tables whose <structfield>relfrozenxid</structfield> value is more than
- <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
- vacuumed (this also applies to those tables whose freeze max age has
- been modified via storage parameters; see below). Otherwise, if the
- number of tuples obsoleted since the last
- <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
- table is vacuumed. The vacuum threshold is defined as:
-<programlisting>
-vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
-</programlisting>
- where the vacuum base threshold is
- <xref linkend="guc-autovacuum-vacuum-threshold"/>,
- the vacuum scale factor is
- <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
- and the number of tuples is
- <structname>pg_class</structname>.<structfield>reltuples</structfield>.
- </para>
-
- <para>
- The table is also vacuumed if the number of tuples inserted since the last
- vacuum has exceeded the defined insert threshold, which is defined as:
-<programlisting>
-vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
-</programlisting>
- where the vacuum insert base threshold is
- <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
- and vacuum insert scale factor is
- <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
- Such vacuums may allow portions of the table to be marked as
- <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
- it is a semi-accurate count updated by each <command>UPDATE</command>,
- <command>DELETE</command> and <command>INSERT</command> operation. (It is
- only semi-accurate because some information might be lost under heavy
- load.) If the <structfield>relfrozenxid</structfield> value of the table
- is more than <varname>vacuum_freeze_table_age</varname> transactions old,
- an aggressive vacuum is performed to freeze old tuples and advance
- <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
- since the last vacuum are scanned.
- </para>
-
- <para>
- For analyze, a similar condition is used: the threshold, defined as:
-<programlisting>
-analyze threshold = analyze base threshold + analyze scale factor * number of tuples
-</programlisting>
- is compared to the total number of tuples inserted, updated, or deleted
- since the last <command>ANALYZE</command>.
- </para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
- <para>
- The default thresholds and scale factors are taken from
- <filename>postgresql.conf</filename>, but it is possible to override them
- (and many other autovacuum control parameters) on a per-table basis; see
- <xref linkend="sql-createtable-storage-parameters"/> for more information.
- If a setting has been changed via a table's storage parameters, that value
- is used when processing that table; otherwise the global settings are
- used. See <xref linkend="runtime-config-autovacuum"/> for more details on
- the global settings.
- </para>
-
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
-
- <para>
- Autovacuum workers generally don't block other commands. If a process
- attempts to acquire a lock that conflicts with the
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
- acquisition will interrupt the autovacuum. For conflicting lock modes,
- see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
- is running to prevent transaction ID wraparound (i.e., the autovacuum query
- name in the <structname>pg_stat_activity</structname> view ends with
- <literal>(to prevent wraparound)</literal>), the autovacuum is not
- automatically interrupted.
- </para>
-
- <warning>
- <para>
- Regularly running commands that acquire locks conflicting with a
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
- effectively prevent autovacuums from ever completing.
- </para>
- </warning>
- </sect2>
</sect1>
--
2.40.1
v3-0009-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v3-0009-Overhaul-freezing-and-wraparound-docs.patchDownload
From 27edfce6bc6414e79d79741248cfe5eae258e92a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 13:04:13 -0700
Subject: [PATCH v3 9/9] Overhaul freezing and wraparound docs.
This is almost a complete rewrite. "Preventing Transaction ID
Wraparound Failures" becomes "Freezing to manage the transaction ID
space". This is follow-up work to commit 1de58df4, which added
page-level freezing to VACUUM.
The emphasis is now on the physical work of freezing pages. This flows
a little better than it otherwise would due to recent structural
cleanups to maintenance.sgml; discussion about freezing now immediately
follows discussion of cleanup of dead tuples. We still talk about the
problem of the system activating xidStopLimit protections in the same
section, but we use much less alarmist language about data corruption,
and are no longer overly concerned about the very worst case. We don't
rescind the recommendation that users recover from an xidStopLimit
outage by using single user mode, though that seems like something we
should aim to do in the near future.
There is no longer a separate sect3 to discuss MultiXactId related
issues. VACUUM now performs exactly the same processing steps when it
freezes a page, independent of the trigger condition.
Also describe the page-level freezing FPI optimization added by commit
1de58df4. This is expected to trigger the majority of all freezing with
many types of workloads.
---
doc/src/sgml/config.sgml | 20 +-
doc/src/sgml/logicaldecoding.sgml | 2 +-
doc/src/sgml/maintenance.sgml | 967 ++++++++++++++++------
doc/src/sgml/ref/create_table.sgml | 2 +-
doc/src/sgml/ref/prepare_transaction.sgml | 2 +-
doc/src/sgml/ref/vacuum.sgml | 6 +-
doc/src/sgml/ref/vacuumdb.sgml | 4 +-
doc/src/sgml/xact.sgml | 2 +-
8 files changed, 724 insertions(+), 281 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b56f073a9..a4ac4e740 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8359,7 +8359,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Note that even when this parameter is disabled, the system
will launch autovacuum processes if necessary to
prevent transaction ID wraparound. See <xref
- linkend="vacuum-for-wraparound"/> for more information.
+ linkend="freezing-xid-space"/> for more information.
</para>
</listitem>
</varlistentry>
@@ -8548,7 +8548,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
This parameter can only be set at server start, but the setting
can be reduced for individual tables by
changing table storage parameters.
- For more information see <xref linkend="vacuum-for-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -8577,7 +8577,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
400 million multixacts.
This parameter can only be set at server start, but the setting can
be reduced for individual tables by changing table storage parameters.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="anti-wraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -9284,7 +9284,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound autovacuum is launched for the table. For more
information see
- <xref linkend="vacuum-for-wraparound"/>.
+ <xref linkend="aggressive-vacuum"/>.
</para>
</listitem>
</varlistentry>
@@ -9306,7 +9306,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-freeze-max-age"/>, so
that there is not an unreasonably short time between forced
autovacuums. For more information see <xref
- linkend="vacuum-for-wraparound"/>.
+ linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9343,7 +9343,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
set this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-freeze-max-age"/>.
+ linkend="guc-autovacuum-freeze-max-age"/>. For more
+ information see <xref linkend="xid-stop-limit"/>.
</para>
</listitem>
</varlistentry>
@@ -9367,7 +9368,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<xref linkend="guc-autovacuum-multixact-freeze-max-age"/>, so that a
periodic manual <command>VACUUM</command> has a chance to run before an
anti-wraparound is launched for the table.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="anti-wraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -9388,7 +9389,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>,
so that there is not an unreasonably short time between forced
autovacuums.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9421,7 +9422,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. For more
+ information see <xref linkend="xid-stop-limit"/>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cbd3aa804..80dade3be 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -353,7 +353,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
because neither required WAL nor required rows from the system catalogs
can be removed by <command>VACUUM</command> as long as they are required by a replication
slot. In extreme cases this could cause the database to shut down to prevent
- transaction ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ transaction ID wraparound (see <xref linkend="freezing-xid-space"/>).
So if a slot is no longer required it should be dropped.
</para>
</caution>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 5546d8c7d..a480e4f8e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -148,13 +148,8 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
Such vacuums may allow portions of the table to be marked as
<firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
+ can reduce the work required in subsequent vacuums. The number of obsolete tuples
+ and the number of inserted tuples are obtained from the cumulative statistics system;
it is a semi-accurate count updated by each <command>UPDATE</command>,
<command>DELETE</command> and <command>INSERT</command> operation. (It is
only semi-accurate because some information might be lost under heavy
@@ -273,15 +268,20 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To maintain the system's ability to allocated new
+ transaction IDs through freezing.</simpara>
</listitem>
<listitem>
<simpara>To update the visibility map, which speeds
up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
+ scans</link>, and helps the next <command>VACUUM</command>
+ operation avoid needlessly scanning already-frozen pages.</simpara>
+ </listitem>
+
+ <listitem>
+ <simpara>To truncate obsolescent transaction status information,
+ when possible.</simpara>
</listitem>
<listitem>
@@ -483,302 +483,671 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</warning>
</sect2>
- <sect2 id="vacuum-for-wraparound">
- <title>Preventing Transaction ID Wraparound Failures</title>
-
- <indexterm zone="vacuum-for-wraparound">
- <primary>transaction ID</primary>
- <secondary>wraparound</secondary>
- </indexterm>
+ <sect2 id="freezing-xid-space">
+ <title>Freezing to manage the transaction ID space</title>
<indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
+ <primary>Freezing</primary>
+ <secondary>of transaction IDs and MultiXact IDs</secondary>
</indexterm>
<para>
- <productname>PostgreSQL</productname>'s <link
- linkend="mvcc-intro">MVCC</link> transaction semantics depend on
- being able to compare <glossterm linkend="glossary-xid">transaction
- ID numbers (<acronym>XID</acronym>)</glossterm> to determine
- whether or not the row is visible to each query's MVCC snapshot
- (see <xref linkend="interpreting-xid-stamps"/>). But since
- on-disk storage of transaction IDs in heap pages uses a truncated
- 32-bit representation to save space (rather than the full 64-bit
- representation), it is necessary to vacuum every table in every
- database <emphasis>at least</emphasis> once every two billion
- transactions (though far more frequent vacuuming is typical).
+ <command>VACUUM</command> often marks some of the pages that it
+ scans <emphasis>frozen</emphasis>, indicating that all eligible
+ rows on the page were inserted by a transaction that committed
+ sufficiently far in the past that the effects of the inserting
+ transaction are certain to be visible to all current and future
+ transactions. The specific Transaction ID number
+ (<acronym>XID</acronym>) stored in a frozen heap row's
+ <structfield>xmin</structfield> field is no longer needed to
+ determine anything about the row's visibility. Furthermore, when
+ a row undergoing freezing happens to have an XID set in its
+ <structfield>xmax</structfield> field (possibly an XID left behind
+ by an earlier <command>SELECT FOR UPDATE</command> row locker),
+ the <structfield>xmax</structfield> field's XID is usually also
+ removed.
</para>
<para>
- <xref linkend="guc-vacuum-freeze-min-age"/>
- controls how old an XID value has to be before rows bearing that XID will be
- frozen. Increasing this setting may avoid unnecessary work if the
- rows that would otherwise be frozen will soon be modified again,
- but decreasing this setting increases
- the number of transactions that can elapse before the table must be
- vacuumed again.
+ Once frozen, heap pages are <quote>self-contained</quote>. Every
+ query can read all of the page's rows in a way that assumes that
+ the inserting transaction committed and is visible to its
+ <acronym>MVCC</acronym> snapshot. No query will ever have to
+ consult external transaction status metadata to interpret the
+ page's contents, either. In particular,
+ <filename>pg_xact</filename> transaction XID commit/abort status
+ lookups won't take place during query execution.
</para>
<para>
- <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link>
- to determine which pages of a table must be scanned. Normally, it
- will skip pages that don't have any dead row versions even if those pages
- might still have row versions with old XID values. Therefore, normal
- <command>VACUUM</command>s won't always freeze every old row version in the table.
- When that happens, <command>VACUUM</command> will eventually need to perform an
- <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen
- XID and MXID values, including those from all-visible but not all-frozen pages.
- In practice most tables require periodic aggressive vacuuming.
- <xref linkend="guc-vacuum-freeze-table-age"/>
- controls when <command>VACUUM</command> does that: all-visible but not all-frozen
- pages are scanned if the number of transactions that have passed since the
- last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus
- <varname>vacuum_freeze_min_age</varname>. Setting
- <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to
- always use its aggressive strategy.
+ Freezing is a <acronym>WAL</acronym>-logged operation, so when
+ <command>VACUUM</command> freezes a heap page, any copy of the
+ page located on a physical replication standby server will itself
+ be <quote>frozen</quote> shortly thereafter (when the relevant
+ <literal>FREEZE_PAGE</literal> <acronym>WAL</acronym> record is
+ replayed on the standby). Queries that run on physical
+ replication standbys thereby avoid <filename>pg_xact</filename>
+ lookups when reading from frozen pages, in just the same way as
+ queries that run on the primary server
+ <footnote>
+ <para>
+ In this regard freezing is unlike setting transaction status
+ <quote>hint bits</quote> in tuple headers: setting hint bits
+ doesn't usually need to be <acronym>WAL</acronym>-logged, and
+ can take place on physical replication standby servers without
+ the involvement of the primary server. The purpose of hint bits
+ is to avoid repeat <filename>pg_xact</filename> lookups for the
+ same tuples, strictly as an optimization. The purpose of
+ freezing (from the point of view of individual tuples) is to
+ <emphasis>reliably</emphasis> remove each tuple's dependency on
+ <filename>pg_xact</filename>, ultimately making it safe to
+ truncate <filename>pg_xact</filename> from time to time.
+ </para>
+ </footnote>.
</para>
<para>
- The maximum time that a table can go unvacuumed is two billion
- transactions minus the <varname>vacuum_freeze_min_age</varname> value at
- the time of the last aggressive vacuum. If it were to go
- unvacuumed for longer than
- that, data loss could result. To ensure that this does not happen,
- autovacuum is invoked on any table that might contain unfrozen rows with
- XIDs older than the age specified by the configuration parameter <xref
- linkend="guc-autovacuum-freeze-max-age"/>. (This will happen even if
- autovacuum is disabled.)
+ It can be useful for <command>VACUUM</command> to put off some of
+ the work of freezing, but <command>VACUUM</command> cannot put off
+ freezing forever. Since on-disk storage of transaction IDs in
+ heap row headers uses a truncated 32-bit representation to save
+ space (rather than the full 64-bit representation), freezing plays
+ a crucial role in enabling <link
+ linkend="aggressive-vacuum">management of the XID address
+ space</link> by <command>VACUUM</command>. If, for whatever
+ reason, <command>VACUUM</command> is unable to freeze older XIDs
+ on behalf of an application that continues to require new XID
+ allocations, the system will eventually
+ <link linkend="xid-stop-limit">refuse to allocate new transaction IDs</link>.
+ The system generally only enters this state when autovacuum is
+ misconfigured.
</para>
<para>
- This implies that if a table is not otherwise vacuumed,
- autovacuum will be invoked on it approximately once every
- <varname>autovacuum_freeze_max_age</varname> minus
- <varname>vacuum_freeze_min_age</varname> transactions.
- For tables that are regularly vacuumed for space reclamation purposes,
- this is of little importance. However, for static tables
- (including tables that receive inserts, but no updates or deletes),
- there is no need to vacuum for space reclamation, so it can
- be useful to try to maximize the interval between forced autovacuums
- on very large static tables. Obviously one can do this either by
- increasing <varname>autovacuum_freeze_max_age</varname> or decreasing
- <varname>vacuum_freeze_min_age</varname>.
+ <xref linkend="guc-vacuum-freeze-min-age"/> controls when freezing
+ takes place. When <command>VACUUM</command> scans a heap page
+ containing even one XID that has already attained an age exceeding
+ this value, the page is frozen.
+ </para>
+
+ <indexterm>
+ <primary>MultiXact ID</primary>
+ <secondary>Freezing of</secondary>
+ </indexterm>
+
+ <para>
+ <firstterm>MultiXact IDs</firstterm> are used to support row
+ locking by multiple transactions. Since there is only limited
+ space in a tuple header to store lock information, that
+ information is encoded as a <quote>multiple transaction
+ ID</quote>, or MultiXact ID for short, whenever there is more
+ than one transaction concurrently locking a row. Information
+ about which transaction IDs are included in any particular
+ MultiXact ID is stored separately in
+ <filename>pg_multixact</filename>, and only the MultiXact ID
+ itself (a 32-bit unsigned integer) appears in the tuple's
+ <structfield>xmax</structfield> field. This creates a dependency
+ on external transaction status information similar to the
+ dependency that ordinary unfrozen XIDs have on commit status
+ information stored in <filename>pg_xact</filename>.
+ <command>VACUUM</command> must therefore occasionally remove
+ MultiXact IDs from tuples during freezing.
</para>
<para>
- The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 *
- <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be
- capped to the maximum. A value higher than
- <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an
- anti-wraparound autovacuum would be triggered at that point anyway, and
- the 0.95 multiplier leaves some breathing room to run a manual
- <command>VACUUM</command> before that happens. As a rule of thumb,
- <command>vacuum_freeze_table_age</command> should be set to a value somewhat
- below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that
- a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by
- normal delete and update activity is run in that window. Setting it too
- close could lead to anti-wraparound autovacuums, even though the table
- was recently vacuumed to reclaim space, whereas lower values lead to more
- frequent aggressive vacuuming.
+ <xref linkend="guc-vacuum-multixact-freeze-min-age"/> also
+ controls when freezing takes place. It is analogous to
+ <varname>vacuum_freeze_min_age</varname>, but <quote>age</quote>
+ is expressed in units of MultiXact ID.
+ Lowering <varname>vacuum_multixact_freeze_min_age</varname>
+ <emphasis>forces</emphasis> <command>VACUUM</command> to process
+ <structfield>xmax</structfield> fields containing a MultiXact ID
+ in cases where it would otherwise opt to put off the work of
+ processing <structfield>xmax</structfield> until the next
+ <command>VACUUM</command> <footnote>
+ <para>
+ <quote>Freezing</quote> of <structfield>xmax</structfield>
+ fields (whether they were found to contain an XID or a MultiXact
+ ID) generally means clearing <structfield>xmax</structfield>.
+ <command>VACUUM</command> may occasionally encounter an
+ individual MultiXact ID that must be removed to advance
+ <structfield>relminmxid</structfield> by the required amount,
+ which can only be processed by generating a replacement
+ MultiXact ID (containing just the non-removable subset of member
+ XIDs from the original MultiXact ID), and then setting the
+ tuple's <structfield>xmax</structfield> to the new/replacement
+ MultiXact ID value.
+ </para>
+ </footnote>. The setting generally doesn't significantly
+ influence the total number of pages <command>VACUUM</command>
+ freezes, even in tables that contain relatively many MultiXact
+ IDs. This is because <command>VACUUM</command> generally prefers
+ proactively processing for most individual
+ <structfield>xmax</structfield> fields that contain a MultiXact ID
+ (eager proactive processing is typically cheaper).
</para>
<para>
- The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname>
- (and <varname>vacuum_freeze_table_age</varname> along with it) is that
- the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename>
- subdirectories of the database cluster will take more space, because it
- must store the commit status and (if <varname>track_commit_timestamp</varname> is
- enabled) timestamp of all transactions back to
- the <varname>autovacuum_freeze_max_age</varname> horizon. The commit status uses
- two bits per transaction, so if
- <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value
- of two billion, <filename>pg_xact</filename> can be expected to grow to about half
- a gigabyte and <filename>pg_commit_ts</filename> to about 20GB. If this
- is trivial compared to your total database size,
- setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value
- is recommended. Otherwise, set it depending on what you are willing to
- allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage.
- (The default, 200 million transactions, translates to about 50MB
- of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename>
- storage.)
+ Managing the added <acronym>WAL</acronym> volume from freezing
+ over time is an important consideration for
+ <command>VACUUM</command>. It is why <command>VACUUM</command>
+ doesn't just freeze every eligible tuple at the earliest
+ opportunity: the <acronym>WAL</acronym> written to freeze a page's
+ tuples <quote>goes to waste</quote> in cases where the resulting
+ frozen tuples are soon deleted or updated anyway. It's also why
+ <command>VACUUM</command> <emphasis>will</emphasis> freeze all
+ eligible tuples from a heap page once the decision to freeze at
+ least one tuple is taken: at that point the added cost to freeze
+ all eligible tuples eagerly (measured in <quote>extra bytes of
+ <acronym>WAL</acronym> written</quote>) is far lower than the
+ probable cost of deferring freezing until a future
+ <command>VACUUM</command> operation against the same table.
+ Furthermore, once the page is frozen it can generally be marked as
+ all-frozen in the visibility map right away.
</para>
- <para>
- One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that
- it might cause <command>VACUUM</command> to do useless work: freezing a row
- version is a waste of time if the row is modified
- soon thereafter (causing it to acquire a new XID). So the setting should
- be large enough that rows are not frozen until they are unlikely to change
- any more.
- </para>
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 16,
+ <command>VACUUM</command> triggered freezing at the level of
+ individual <structfield>xmin</structfield> and
+ <structfield>xmax</structfield> fields. Freezing only affected
+ the exact XIDs that had already attained an age of
+ <varname>vacuum_freeze_min_age</varname> or greater.
+ </para>
+ </note>
<para>
- To track the age of the oldest unfrozen XIDs in a database,
- <command>VACUUM</command> stores XID
- statistics in the system tables <structname>pg_class</structname> and
- <structname>pg_database</structname>. In particular,
- the <structfield>relfrozenxid</structfield> column of a table's
- <structname>pg_class</structname> row contains the oldest remaining unfrozen
- XID at the end of the most recent <command>VACUUM</command> that successfully
- advanced <structfield>relfrozenxid</structfield> (typically the most recent
- aggressive VACUUM). Similarly, the
- <structfield>datfrozenxid</structfield> column of a database's
- <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
- appearing in that database — it is just the minimum of the
- per-table <structfield>relfrozenxid</structfield> values within the database.
- A convenient way to
- examine this information is to execute queries such as:
-
-<programlisting>
-SELECT c.oid::regclass as table_name,
- greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
-FROM pg_class c
-LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
-WHERE c.relkind IN ('r', 'm');
-
-SELECT datname, age(datfrozenxid) FROM pg_database;
-</programlisting>
-
- The <literal>age</literal> column measures the number of transactions from the
- cutoff XID to the current transaction's XID.
+ <command>VACUUM</command> also triggers freezing of a page in
+ cases where it already proved necessary to write out a full page
+ image (<acronym>FPI</acronym>) as part of a <acronym>WAL</acronym>
+ record describing how dead tuples were removed <footnote>
+ <para>
+ Actually, the <quote>freeze on an <acronym>FPI</acronym>
+ write</quote> mechanism isn't just triggered whenever
+ <command>VACUUM</command> needed to write an
+ <acronym>FPI</acronym> for torn page protection as part of
+ writing a <literal>PRUNE</literal> <acronym>WAL</acronym> record
+ describing how dead tuples were removed. The
+ <acronym>FPI</acronym> mechanism can also be triggered when hint
+ bits are set by <command>VACUUM</command>, if and only if doing
+ so necessitates writing an <acronym>FPI</acronym>.
+ <acronym>WAL</acronym>-logging in order to set hint bits is only
+ possible when the <xref linkend="guc-wal-log-hints"/> option is
+ enabled in <filename>postgresql.conf</filename>, or when data
+ checksums were enabled when the cluster was initialized with
+ <xref linkend="app-initdb"/>.
+ </para>
+ </footnote> (see <xref linkend="wal-reliability"/> for background
+ information about how <acronym>FPI</acronym>s provide torn page
+ protection). This <quote>freeze on an <acronym>FPI</acronym>
+ write</quote> batching mechanism often avoids the need for some
+ future <command>VACUUM</command> operation to write an additional
+ <acronym>FPI</acronym> for the same page as part of a
+ <acronym>WAL</acronym> record describing how live tuples were
+ frozen. In effect, <command>VACUUM</command> writes slightly more
+ <acronym>WAL</acronym> in the short term with the aim of
+ ultimately needing to write much less <acronym>WAL</acronym> in
+ the long term.
</para>
<tip>
<para>
- When the <command>VACUUM</command> command's <literal>VERBOSE</literal>
- parameter is specified, <command>VACUUM</command> prints various
- statistics about the table. This includes information about how
- <structfield>relfrozenxid</structfield> and
- <structfield>relminmxid</structfield> advanced, and the number of
- newly frozen pages. The same details appear in the server log when
- autovacuum logging (controlled by <xref
- linkend="guc-log-autovacuum-min-duration"/>) reports on a
- <command>VACUUM</command> operation executed by autovacuum.
+ For tables which receive <command>INSERT</command> operations,
+ but few or no <command>UPDATE</command>/<command>DELETE</command>
+ operations, it may be beneficial to selectively lower <xref
+ linkend="reloption-autovacuum-freeze-min-age"/> for the table.
+ <command>VACUUM</command> may thereby be able to freeze the
+ table's pages <quote>eagerly</quote> during earlier autovacuums
+ triggered by <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
</para>
</tip>
- <para>
- <command>VACUUM</command> normally only scans pages that have been modified
- since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
- advanced when every page of the table
- that might contain unfrozen XIDs is scanned. This happens when
- <structfield>relfrozenxid</structfield> is more than
- <varname>vacuum_freeze_table_age</varname> transactions old, when
- <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all
- pages that are not already all-frozen happen to
- require vacuuming to remove dead row versions. When <command>VACUUM</command>
- scans every page in the table that is not already all-frozen, it should
- set <literal>age(relfrozenxid)</literal> to a value just a little more than the
- <varname>vacuum_freeze_min_age</varname> setting
- that was used (more by the number of transactions started since the
- <command>VACUUM</command> started). <command>VACUUM</command>
- will set <structfield>relfrozenxid</structfield> to the oldest XID
- that remains in the table, so it's possible that the final value
- will be much more recent than strictly required.
- If no <structfield>relfrozenxid</structfield>-advancing
- <command>VACUUM</command> is issued on the table until
- <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
- be forced for the table.
- </para>
+ <caution>
+ <para>
+ <command>VACUUM</command> may not be able to freeze every tuple's
+ <structfield>xmin</structfield> in relatively rare cases. The
+ criteria that determines basic eligibility for freezing is the
+ same as the one that determines if a deleted tuple can be
+ removed: the XID-based <literal>removable cutoff</literal> that
+ appears in the server log's autovacuum log reports (controlled by
+ <xref linkend="guc-log-autovacuum-min-duration"/>).
+ </para>
+ <para>
+ In extreme cases, a long-running transaction can hold back every
+ <command>VACUUM</command>'s removable cutoff for so long that the
+ system is forced to activate <link
+ linkend="xid-stop-limit"><literal>xidStopLimit</literal> mode
+ protections</link>.
+ </para>
+ </caution>
- <para>
- If for some reason autovacuum fails to clear old XIDs from a table, the
- system will begin to emit warning messages like this when the database's
- oldest XIDs reach forty million transactions from the wraparound point:
+ <sect3 id="aggressive-vacuum">
+ <title>Aggressive <command>VACUUM</command></title>
+
+ <indexterm zone="aggressive-vacuum">
+ <primary>transaction ID</primary>
+ <secondary>wraparound</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs and MultiXact IDs</secondary>
+ </indexterm>
+
+ <para>
+ As noted already, freezing doesn't just allow queries to avoid
+ lookups of subsidiary transaction status information in
+ structures such as <filename>pg_xact</filename>. Freezing also
+ plays a crucial role in enabling management of the XID address
+ space by <command>VACUUM</command>. <command>VACUUM</command>
+ maintains information about the oldest unfrozen XID that remains
+ in the table when it uses its <firstterm>aggressive strategy</firstterm>.
+ </para>
+
+ <para>
+ Aggressive <command>VACUUM</command> will update the table's
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
+ to the value that it determined to be the oldest remaining XID;
+ the table's <structfield>relfrozenxid</structfield>
+ <quote>advances</quote> by a certain number of XIDs. Aggressive
+ <command>VACUUM</command> may also need to update the
+ <structfield>datfrozenxid</structfield> column of the database's
+ <structname>pg_database</structname> row in turn.
+ <structfield>datfrozenxid</structfield> is a lower bound on the
+ unfrozen XIDs appearing in that database — it is just the
+ minimum of the per-table <structfield>relfrozenxid</structfield>
+ values (the <structfield>relfrozenxid</structfield> that has
+ attained the greatest age) within the database.
+ </para>
+
+ <para>
+ Aggressive <command>VACUUM</command> also maintains the
+ <structname>pg_class</structname>.<structfield>relminmxid</structfield>
+ and <structname>pg_database</structname>.<structfield>datminmxid</structfield>
+ fields. These are needed to track the oldest MultiXact ID that
+ remains in the table and database, respectively.
+ </para>
+
+ <para>
+ The extra steps performed within every aggressive
+ <command>VACUUM</command> against every table have the overall
+ effect of tracking the oldest remaining unfrozen transaction ID
+ in the entire cluster (every table from every database).
+ Aggressive <command>VACUUM</command>s will (in the aggregate and
+ over time) make sure that the oldest unfrozen transaction ID in
+ the entire system is never too far in the past.
+ </para>
+
+ <note>
+ <title>Managing the Transaction ID Space</title>
+ <para>
+ Freezing removes <emphasis>local</emphasis> dependencies on
+ external transaction status information from individual heap
+ pages. Advancing <structfield>relfrozenxid</structfield>
+ removes <emphasis>global</emphasis> dependencies from whole
+ tables in turn.
+ </para>
+ <para>
+ The oldest XID in the entire cluster can be thought of as the
+ beginning of the XID space, while the next unallocated XID can
+ be thought of as the end of the XID space. This space
+ represents the range of XIDs that might still require
+ transaction commit/abort status lookups in <filename>pg_xact</filename>.
+ </para>
+ </note>
+
+ <para>
+ The maximum XID age that the system can tolerate (i.e., the
+ maximum <quote>distance</quote> between the oldest unfrozen
+ transaction ID in any table according to
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>,
+ and the next unallocated transaction ID) is about 2.1 billion
+ transaction IDs. This <quote>maximum XID age</quote> invariant
+ makes it fundamentally impossible to put off aggressive
+ <command>VACUUM</command>s (and freezing) forever
+ <footnote>
+ <para>
+ Aggressive <command>VACUUM</command>s cannot be put off
+ forever, <emphasis>barring the edge-case where the
+ installation is never expected to consume more than about 2.1
+ billion XIDs</emphasis>. In practice this has practical
+ relevance.
+ </para>
+ </footnote>. The invariant imposes an absolute hard limit on how
+ long any table can go without an aggressive <command>VACUUM</command>.
+ </para>
+
+ <para>
+ If the hard limit is ever reached, then the system will activate
+ <link linkend="xid-stop-limit"><literal>xidStopLimit</literal>
+ mode</link>, which temporarily prevents the allocation of new
+ permanent transaction IDs. The system will only deactive
+ <literal>xidStopLimit</literal> mode when
+ <command>VACUUM</command> (typically run by autovacuum) succeeds
+ in advancing the oldest <literal>datfrozenxid</literal> in the
+ cluster (via an aggressive <command>VACUUM</command> that runs to
+ completion against the table that has the oldest
+ <structfield>relfrozenxid</structfield>).
+ </para>
+
+ <para>
+ The 2.1 billion XIDs <quote>maximum XID age</quote> invariant
+ must be preserved because transaction IDs stored in heap row
+ headers use a truncated 32-bit representation (rather than the
+ full 64-bit representation). Since all unfrozen transaction IDs
+ from heap tuple headers <emphasis>must</emphasis> be from the
+ same transaction ID epoch (or from a space in the 64-bit
+ representation that spans two adjoining transaction ID epochs),
+ there isn't any need to store a separate epoch field in each
+ tuple header (see <xref linkend="interpreting-xid-stamps"/> for
+ further details). This scheme has the advantage of requiring
+ much less on-disk storage space than a design that stores an XID
+ epoch alongside each XID stored in each heap tuple header. It
+ has the disadvantage of constraining the system's ability to
+ allocate new XIDs in the worst case scenario where
+ <literal>xidStopLimit</literal> mode is used to preserve the
+ <quote>maximum XID age</quote> invariant.
+ </para>
+
+ <para>
+ There is only one major runtime behavioral difference between
+ aggressive mode <command>VACUUM</command>s and non-aggressive
+ <command>VACUUM</command>s: only non-aggressive
+ <command>VACUUM</command>s will skip pages that don't have any
+ dead row versions even if those pages still have row versions
+ with old XID values (pages marked as all-visible in the
+ visibility map). Aggressive <command>VACUUM</command>s can only
+ skip pages that are marked as both all-visible and all-frozen.
+ Consequently, non-aggressive <command>VACUUM</command>s usually
+ won't freeze <emphasis>every</emphasis> page containing an XID
+ that has already attained an age of
+ <varname>vacuum_freeze_min_age</varname> or more. Failing to
+ freeze older pages during non-aggressive
+ <command>VACUUM</command>s may lead to aggressive
+ <command>VACUUM</command>s that perform a disproportionately
+ large amount of the work of freezing required by one particular
+ table.
+ </para>
+
+ <tip>
+ <para>
+ When the <command>VACUUM</command> command's
+ <literal>VERBOSE</literal> parameter is specified,
+ <command>VACUUM</command> prints various statistics about the
+ table. Its output includes information about how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advanced, and the number
+ of newly frozen pages. The same details appear in the server
+ log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum.
+ </para>
+ </tip>
+
+ <note>
+ <para>
+ In practice, most tables require periodic aggressive vacuuming.
+ However, some individual non-aggressive
+ <command>VACUUM</command> operations may be able to advance
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield>. Non-aggressive
+ <structfield>relfrozenxid</structfield>/<structfield>relminmxid</structfield>
+ advancement is most common in small, frequently modified tables.
+ </para>
+ </note>
+
+ <para>
+ Most individual tables will eventually need an aggressive
+ <command>VACUUM</command>, which will reliably freeze all pages
+ with XID (or MultiXact ID) values older than
+ <varname>vacuum_freeze_min_age</varname> (or older than
+ <varname>vacuum_multixact_freeze_min_age</varname>), including
+ those from all-visible but not all-frozen pages (and then advance
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
+ to a value that reflects all that). <xref
+ linkend="guc-vacuum-freeze-table-age"/> controls when
+ <command>VACUUM</command> must use its aggressive strategy. If
+ <literal>age(relfrozenxid)</literal> exceeds
+ <varname>vacuum_freeze_table_age</varname> at the start of
+ <command>VACUUM</command>, that <command>VACUUM</command> will
+ use the aggressive strategy; otherwise the standard
+ non-aggressive strategy is used. Setting
+ <varname>vacuum_freeze_table_age</varname> to 0 forces
+ <command>VACUUM</command> to always use its aggressive strategy.
+ </para>
+ </sect3>
+
+ <sect3 id="anti-wraparound-autovacuums">
+ <title>Anti-Wraparound Autovacuums</title>
+
+ <para>
+ To ensure that every table has its
+ <structfield>relfrozenxid</structfield> advanced at somewhat
+ regular intervals, even in the case of completely static tables,
+ autovacuum runs against any table that might contain unfrozen
+ rows with XIDs older than the age specified by the configuration
+ parameter <xref linkend="guc-autovacuum-freeze-max-age"/>. These
+ are <firstterm>anti-wraparound autovacuums</firstterm>.
+ Anti-wraparound autovacuums can happen even when autovacuum is
+ nominally disabled in <filename>postgresql.conf</filename>.
+ </para>
+
+ <para>
+ In practice, all anti-wraparound autovacuums will use
+ <command>VACUUM</command>'s aggressive strategy (if they didn't,
+ then it would defeat the whole purpose of anti-wraparound
+ autovacuuming). Use of <command>VACUUM</command>'s aggressive
+ strategy is certain, because the effective value of
+ <varname>vacuum_freeze_table_age</varname> is silently
+ <quote>clamped</quote> to a value no greater than 95% of the
+ current value of <varname>autovacuum_freeze_max_age</varname>.
+ </para>
+
+ <para>
+ As a rule of thumb, <command>vacuum_freeze_table_age</command>
+ should be set to a value somewhat below
+ <varname>autovacuum_freeze_max_age</varname>, so that there is a
+ window during which any autovacuum triggered by inserts, updates,
+ or deletes (or any manually issued <command>VACUUM</command>)
+ will become an aggressive <command>VACUUM</command>. Such
+ <command>VACUUM</command>s will reliably advance
+ <structfield>relfrozenxid</structfield> in passing, even though
+ autovacuum won't have specifically set out to make sure
+ <structfield>relfrozenxid</structfield> advances through
+ anti-wraparound autovacuuming. Anti-wraparound autovacuums may
+ never be required at all in tables that regularly require
+ vacuuming to <link linkend="vacuum-for-space-recovery">reclaim
+ space from dead tuples</link> and/or to <link
+ linkend="vacuum-for-visibility-map">set pages all-visible in the
+ visibility map</link> (especially if
+ <varname>vacuum_freeze_table_age</varname> is set to a value
+ significantly below
+ <varname>autovacuum_freeze_max_age</varname>).
+ </para>
+
+ <note>
+ <title>Note on terminology</title>
+ <para>
+ Aggressive <command>VACUUM</command> is a special form of
+ <command>VACUUM</command>. An aggressive
+ <command>VACUUM</command> must advance
+ <structfield>relfrozenxid</structfield> up to an XID value that
+ is no greater than <varname>vacuum_freeze_min_age</varname> XIDs
+ in age as of the <emphasis>start</emphasis> of the
+ <command>VACUUM</command> operation.
+ </para>
+ <para>
+ Anti-wraparound autovacuum is a special form of Autovacuum. Its
+ purpose is to make sure that
+ <structfield>relfrozenxid</structfield> is advanced when no
+ earlier aggressive <command>VACUUM</command> ran and advanced
+ <structfield>relfrozenxid</structfield> in passing (often
+ because no <command>VACUUM</command> needed to run against the
+ table at all).
+ </para>
+ <para>
+ There is only one runtime behavioral difference between
+ anti-wraparound autovacuums and other autovacuums that happen to
+ end up running an aggressive <command>VACUUM</command>:
+ Anti-wraparound autovacuums <emphasis>cannot be
+ autocancelled</emphasis>. This means that autovacuum workers
+ that perform anti-wraparound autovacuuming do not yield to
+ conflicting relation-level lock requests (e.g., from
+ <command>ALTER TABLE</command>). See <xref
+ linkend="autovacuum-lock-conflicts"/> for a full explanation.
+ </para>
+ </note>
+
+ <para>
+ <command>VACUUM</command> also applies <xref
+ linkend="guc-vacuum-multixact-freeze-table-age"/> and <xref
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. These are
+ independent MultiXact ID based triggers of aggressive
+ <command>VACUUM</command> (and anti-wraparound autovacuum). They
+ are applied by following rules analogous to the rules already
+ described for <varname>vacuum_freeze_table_age</varname> and
+ <varname>autovacuum_freeze_max_age</varname>, respectively
+ <footnote>
+ <para>
+ Though note that autovacuum (and <command>VACUUM</command>) use a lower
+ <quote>effective</quote>
+ <varname>autovacuum_multixact_freeze_max_age</varname>
+ value (determined dynamically) to deal with issues with
+ truncation of the <acronym>SLRU</acronym> storage areas, as
+ explained in <xref linkend="vacuum-truncate-xact-status"/>
+ </para>
+ </footnote>.
+ </para>
+
+ <para>
+ It doesn't matter if it was <varname>vacuum_freeze_table_age</varname> or
+ <varname>vacuum_multixact_freeze_table_age</varname> that
+ triggered <command>VACUUM</command>'s decision to use its
+ aggressive strategy. <emphasis>Every</emphasis> aggressive
+ <command>VACUUM</command> will advance
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> by following the same
+ generic steps at runtime.
+ </para>
+
+ <para>
+ A convenient way to examine information about
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> is to execute queries such as:
+
+<programlisting>
+SELECT c.oid::regclass as table_name,
+greatest(age(c.relfrozenxid),
+ age(t.relfrozenxid)) as xid_age,
+mxid_age(c.relminmxid)
+FROM pg_class c
+LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
+WHERE c.relkind IN ('r', 'm');
+
+SELECT datname,
+age(datfrozenxid) as xid_age,
+mxid_age(datminmxid)
+FROM pg_database;
+</programlisting>
+
+ The <function>age</function> function returns the number of
+ transactions from <structfield>relfrozenxid</structfield> to the
+ next unallocated transaction ID. The
+ <function>mxid_age</function> function the number of MultiXact
+ IDs from <structfield>relminmxid</structfield> to the next
+ unallocated MultiXact ID.
+ </para>
+
+ <para>
+ The system should always have significant XID allocation slack
+ capacity. Ideally, the greatest
+ <literal>age(relfrozenxid)</literal>/<literal>age(datfrozenxid)</literal>
+ in the system will never be more than a fraction of the 2.1
+ billion XID hard limit described in <xref
+ linkend="aggressive-vacuum"/>. The default
+ <varname>vacuum_freeze_table_age</varname> setting of 200 million
+ transactions implies that the system should never use
+ significantly more than about 10% of that hard limit.
+ </para>
+
+ <para>
+ There is little advantage in routinely allowing the greatest
+ <literal>age(relfrozenxid)</literal> in the system to get
+ anywhere near to the 2.1 billion XID hard limit. Putting off the
+ work of freezing can only reduce the absolute amount of
+ <acronym>WAL</acronym> written by <command>VACUUM</command> when
+ <command>VACUUM</command> thereby completely avoids freezing rows
+ that are deleted before long anyway. There is little or no
+ disadvantage from lowering <varname>vacuum_freeze_table_age</varname>
+ to make aggressive <command>VACUUM</command>s more frequent, at
+ least in tables where newly frozen pages almost always remain
+ all-frozen forever. Note also that anything that leads to
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advancing less frequently
+ (such as a higher <varname>vacuum_freeze_table_age</varname>
+ setting) will also increase the on-disk space required to store
+ additional transaction status information, as described in <xref
+ linkend="vacuum-truncate-xact-status"/>.
+ </para>
+
+ </sect3>
+
+ <sect3 id="xid-stop-limit">
+ <title><literal>xidStopLimit</literal> mode</title>
+ <para>
+ If for some reason autovacuum utterly fails to advance any
+ table's <structfield>relfrozenxid</structfield> or
+ <structfield>relminmxid</structfield> for an extended period, and
+ if XIDs and/or MultiXact IDs continue to be allocated, the system
+ will begin to emit warning messages like this when the database's
+ oldest XIDs reach forty million transactions from the 2.1 billion
+ XID hard limit described in <xref linkend="aggressive-vacuum"/>:
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
</programlisting>
- (A manual <command>VACUUM</command> should fix the problem, as suggested by the
- hint; but note that the <command>VACUUM</command> must be performed by a
- superuser, else it will fail to process system catalogs and thus not
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will eventually refuse
+ to start any new transactions. This happens at the point that
+ there are fewer than three million transactions left:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
- </para>
-
- <sect3 id="vacuum-for-multixact-wraparound">
- <title>Multixacts and Wraparound</title>
-
- <indexterm>
- <primary>MultiXactId</primary>
- </indexterm>
-
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of multixact IDs</secondary>
- </indexterm>
-
- <para>
- <firstterm>Multixact IDs</firstterm> are used to support row locking by
- multiple transactions. Since there is only limited space in a tuple
- header to store lock information, that information is encoded as
- a <quote>multiple transaction ID</quote>, or multixact ID for short,
- whenever there is more than one transaction concurrently locking a
- row. Information about which transaction IDs are included in any
- particular multixact ID is stored separately in
- the <filename>pg_multixact</filename> subdirectory, and only the multixact ID
- appears in the <structfield>xmax</structfield> field in the tuple header.
- Like transaction IDs, multixact IDs are implemented as a
- 32-bit counter and corresponding storage, all of which requires
- careful aging management, storage cleanup, and wraparound handling.
- There is a separate storage area which holds the list of members in
- each multixact, which also uses a 32-bit counter and which must also
- be managed.
+ The three-million-transaction safety margin exists to let the
+ administrator recover without data loss, by manually executing the
+ required <command>VACUUM</command> commands. However, since the system will not
+ execute commands once it has gone into the safety shutdown mode,
+ the only way to do this is to stop the server and start the server in single-user
+ mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
+ in single-user mode. See the <xref linkend="app-postgres"/> reference
+ page for details about using single-user mode.
</para>
<para>
- Whenever <command>VACUUM</command> scans any part of a table, it will replace
- any multixact ID it encounters which is older than
- <xref linkend="guc-vacuum-multixact-freeze-min-age"/>
- by a different value, which can be the zero value, a single
- transaction ID, or a newer multixact ID. For each table,
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest
- possible multixact ID still appearing in any tuple of that table.
- If this value is older than
- <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive
- vacuum is forced. As discussed in the previous section, an aggressive
- vacuum means that only those pages which are known to be all-frozen will
- be skipped. <function>mxid_age()</function> can be used on
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age.
- </para>
-
- <para>
- Aggressive <command>VACUUM</command>s, regardless of what causes
- them, are <emphasis>guaranteed</emphasis> to be able to advance
- the table's <structfield>relminmxid</structfield>.
- Eventually, as all tables in all databases are scanned and their
- oldest multixact values are advanced, on-disk storage for older
- multixacts can be removed.
- </para>
-
- <para>
- As a safety device, an aggressive vacuum scan will
- occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds 2GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled.
+ In emergencies, <command>VACUUM</command> will take extraordinary
+ measures to avoid <literal>xidStopLimit</literal> mode. A
+ failsafe mechanism is triggered when the table's
+ <structfield>relfrozenxid</structfield> attains an age of <xref
+ linkend="guc-vacuum-failsafe-age"/> XIDs, or when the table's
+ <structfield>relminmxid</structfield> attains an age of <xref
+ linkend="guc-vacuum-multixact-failsafe-age"/> MultiXact IDs.
+ The failsafe prioritizes advancing
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> as quickly as possible.
+ Once the failsafe triggers, <command>VACUUM</command> bypasses
+ all remaining non-essential maintenance tasks, and stops applying
+ any cost-based delay that was in effect. Any <glossterm
+ linkend="glossary-buffer-access-strategy">Buffer Access
+ Strategy</glossterm> in use will also be disabled.
</para>
</sect3>
</sect2>
@@ -787,12 +1156,23 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
<title>Updating the Visibility Map</title>
<para>
- Vacuum maintains a <link linkend="storage-vm">visibility
- map</link> for each table to keep track of which pages contain
- only tuples that are known to be visible to all active
- transactions (and all future transactions, until the page is again
- modified). This has two purposes. First, vacuum itself can skip
- such pages on the next run, since there is nothing to clean up.
+ <command>VACUUM</command> maintains a <link
+ linkend="storage-vm">visibility map</link> for each table to keep
+ track of which pages contain only tuples that are known to be
+ visible to all active transactions (and all future transactions,
+ at least until the page is modified). A separate bit tracks
+ whether all of the tuples are frozen.
+ </para>
+
+ <para>
+ The visibility map serves two purposes.
+ </para>
+
+ <para>
+ First, <command>VACUUM</command> itself can skip such pages on the
+ next run, since there is nothing to clean up. Even <link
+ linkend="aggressive-vacuum">aggressive <command>VACUUM</command>s</link>
+ can skip pages that are both all-visible and all-frozen.
</para>
<para>
@@ -812,6 +1192,65 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect2>
+ <sect2 id="vacuum-truncate-xact-status">
+ <title>Truncating transaction status information</title>
+
+ <para>
+ Anything that influences when and how
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> advance will also directly
+ affect the high watermark storage overhead needed to store
+ historical transaction status information. For example,
+ increasing <varname>autovacuum_freeze_max_age</varname> (and
+ <varname>vacuum_freeze_table_age</varname> along with it) will
+ make the <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename> subdirectories of the database
+ cluster take more space, because they store the commit/abort
+ status and (if <varname>track_commit_timestamp</varname> is enabled)
+ timestamp of all transactions back to the
+ <varname>datfrozenxid</varname> horizon (the earliest
+ <varname>datfrozenxid</varname> among all databases in the
+ cluster).
+ </para>
+
+ <para>
+ The commit status uses two bits per transaction. The default
+ <varname>autovacuum_freeze_max_age</varname> setting of 200
+ million transactions translates to about 50MB of
+ <filename>pg_xact</filename> storage. When
+ <varname>track_commit_timestamp</varname> is enabled, about 2GB of
+ <filename>pg_commit_ts</filename> storage will also be required.
+ </para>
+
+ <para>
+ MultiXact ID status information storage uses two separate
+ underlying <acronym>SLRU</acronym> storage areas:
+ <filename>pg_multixact/members</filename>, and
+ <filename>pg_multixact/offsets</filename>. There is no simple
+ formula to determine the storage overhead per MultiXact ID, since
+ in general MultiXact IDs have a variable number of member XIDs.
+ Note, however, that if <filename>pg_multixact/members</filename>
+ exceeds 2GB, then the effective value of
+ <varname>autovacuum_multixact_freeze_max_age</varname> used by
+ <command>VACUUM</command> will be lower, resulting in more
+ frequent aggressive mode <command>VACUUM</command>s.
+ </para>
+
+ <para>
+ Truncation of transaction status information is only possible at
+ the end of <command>VACUUM</command>s that advance the earliest
+ <structfield>relfrozenxid</structfield> (in the case of
+ <filename>pg_xact</filename> and
+ <filename>pg_commit_ts</filename>), or the earliest
+ <structfield>relminmxid</structfield> (in the case of
+ <filename>pg_multixact/members</filename> and
+ <filename>pg_multixact/offsets</filename>) among all tables in the
+ entire database (assuming that its the database with the earliest
+ <structfield>datfrozenxid</structfield> and
+ <structfield>datminmxid</structfield> in the entire cluster).
+ </para>
+ </sect2>
+
<sect2 id="vacuum-for-statistics">
<title>Updating Planner Statistics</title>
@@ -927,7 +1366,7 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</tip>
</sect2>
-</sect1>
+ </sect1>
<sect1 id="routine-reindex">
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fa..8aa332fcf 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1515,7 +1515,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
and/or <command>ANALYZE</command> operations on this table following the rules
discussed in <xref linkend="autovacuum"/>.
If false, this table will not be autovacuumed, except to prevent
- transaction ID wraparound. See <xref linkend="vacuum-for-wraparound"/> for
+ transaction ID wraparound. See <xref linkend="freezing-xid-space"/> for
more about wraparound prevention.
Note that the autovacuum daemon does not run at all (except to prevent
transaction ID wraparound) if the <xref linkend="guc-autovacuum"/>
diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac..ede50d6f7 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -128,7 +128,7 @@ PREPARE TRANSACTION <replaceable class="parameter">transaction_id</replaceable>
This will interfere with the ability of <command>VACUUM</command> to reclaim
storage, and in extreme cases could cause the database to shut down
to prevent transaction ID wraparound (see <xref
- linkend="vacuum-for-wraparound"/>). Keep in mind also that the transaction
+ linkend="freezing-xid-space"/>). Keep in mind also that the transaction
continues to hold whatever locks it held. The intended usage of the
feature is that a prepared transaction will normally be committed or
rolled back as soon as an external transaction manager has verified that
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 57bc4c23e..95efe7d36 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -123,7 +123,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<term><literal>FREEZE</literal></term>
<listitem>
<para>
- Selects aggressive <quote>freezing</quote> of tuples.
+ Makes <quote>freezing</quote> <emphasis>maximally</emphasis>
+ aggressive, and forces <command>VACUUM</command> to use its
+ <link linkend="aggressive-vacuum">aggressive strategy</link>.
Specifying <literal>FREEZE</literal> is equivalent to performing
<command>VACUUM</command> with the
<xref linkend="guc-vacuum-freeze-min-age"/> and
@@ -219,7 +221,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
there are many dead tuples in the table. This may be useful
when it is necessary to make <command>VACUUM</command> run as
quickly as possible to avoid imminent transaction ID wraparound
- (see <xref linkend="vacuum-for-wraparound"/>). However, the
+ (see <xref linkend="freezing-xid-space"/>). However, the
wraparound failsafe mechanism controlled by <xref
linkend="guc-vacuum-failsafe-age"/> will generally trigger
automatically to avoid transaction ID wraparound failure, and
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index da2393783..b61d523c2 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -233,7 +233,7 @@ PostgreSQL documentation
ID age of at least <replaceable class="parameter">mxid_age</replaceable>.
This setting is useful for prioritizing tables to process to prevent
multixact ID wraparound (see
- <xref linkend="vacuum-for-multixact-wraparound"/>).
+ <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the multixact ID age of a relation is
@@ -254,7 +254,7 @@ PostgreSQL documentation
transaction ID age of at least
<replaceable class="parameter">xid_age</replaceable>. This setting
is useful for prioritizing tables to process to prevent transaction
- ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ ID wraparound (see <xref linkend="freezing-xid-space"/>).
</para>
<para>
For the purposes of this option, the transaction ID age of a relation
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index 0762442e1..e372a7875 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -185,7 +185,7 @@
rows and can be inspected using the <xref linkend="pgrowlocks"/>
extension. Row-level read locks might also require the assignment
of multixact IDs (<literal>mxid</literal>; see <xref
- linkend="vacuum-for-multixact-wraparound"/>).
+ linkend="freezing-xid-space"/>).
</para>
</sect1>
--
2.40.1
v3-0008-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v3-0008-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload
From 7de5ab5e2389a6f5dc41f0b9e23e8a169db47834 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:33:42 -0700
Subject: [PATCH v3 8/9] Overhaul "Recovering Disk Space" vacuuming docs.
Say a lot more about the possible impact of long-running transactions on
VACUUM. Remove all talk of administrators getting by without
autovacuum; at most administrators might want to schedule manual VACUUM
operations to supplement autovacuum (this documentation was written at a
time when the visibility map didn't exist, even in its most basic form).
Also describe VACUUM FULL as an entirely different kind of operation to
conventional lazy vacuum.
XXX Open question for this commit:
I wonder if it would make sense to move all of that stuff into its own
new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along
the lines of "what to do about bloat when all else fails, when the
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming".
XXX For now, a lot of the information about CLUSTER and VACUUM FULL is
moved into Note/Warning boxes. This arrangement is definitely going to
be temporary.
---
doc/src/sgml/maintenance.sgml | 174 +++++++++++++++++++---------------
1 file changed, 96 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f481aba..5546d8c7d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -369,100 +369,118 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
This approach is necessary to gain the benefits of multiversion
concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version
must not be deleted while it is still potentially visible to other
- transactions. But eventually, an outdated or deleted row version is no
- longer of interest to any transaction. The space it occupies must then be
- reclaimed for reuse by new rows, to avoid unbounded growth of disk
- space requirements. This is done by running <command>VACUUM</command>.
+ transactions. A deleted row version (whether from an
+ <command>UPDATE</command> or <command>DELETE</command>) will
+ usually cease to be of interest to any still-running transaction
+ shortly after the original deleting transaction commits.
</para>
<para>
- The standard form of <command>VACUUM</command> removes dead row
- versions in tables and indexes and marks the space available for
- future reuse. However, it will not return the space to the operating
- system, except in the special case where one or more pages at the
- end of a table become entirely free and an exclusive table lock can be
- easily obtained. In contrast, <command>VACUUM FULL</command> actively compacts
- tables by writing a complete new version of the table file with no dead
- space. This minimizes the size of the table, but can take a long time.
- It also requires extra disk space for the new copy of the table, until
- the operation completes.
+ The space dead tuples occupy must eventually be reclaimed for
+ reuse by new rows, to avoid unbounded growth of disk space
+ requirements. Reclaiming space from dead rows is
+ <command>VACUUM</command>'s main responsibility.
</para>
<para>
- The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s
- often enough to avoid needing <command>VACUUM FULL</command>. The
- autovacuum daemon attempts to work this way, and in fact will
- never issue <command>VACUUM FULL</command>. In this approach, the idea
- is not to keep tables at their minimum size, but to maintain steady-state
- usage of disk space: each table occupies space equivalent to its
- minimum size plus however much space gets used up between vacuum runs.
- Although <command>VACUUM FULL</command> can be used to shrink a table back
- to its minimum size and return the disk space to the operating system,
- there is not much point in this if the table will just grow again in the
- future. Thus, moderately-frequent standard <command>VACUUM</command> runs are a
- better approach than infrequent <command>VACUUM FULL</command> runs for
- maintaining heavily-updated tables.
+ The <glossterm linkend="glossary-xid">transaction ID number
+ (<acronym>XID</acronym>)</glossterm> based cutoff point that
+ <command>VACUUM</command> uses to determine if a deleted tuple is
+ safe to physically remove is reported under <literal>removable
+ cutoff</literal> in the server log when autovacuum logging
+ (controlled by <xref linkend="guc-log-autovacuum-min-duration"/>)
+ reports on a <command>VACUUM</command> operation executed by
+ autovacuum. Tuples that are not yet safe to remove are counted as
+ <literal>dead but not yet removable</literal> tuples in the log
+ report. <command>VACUUM</command> establishes its
+ <literal>removable cutoff</literal> once, at the start of the
+ operation. Any older <acronym>MVCC</acronym> snapshot (or
+ transaction that allocates an XID) that's still running when the
+ cutoff is established may hold it back.
</para>
- <para>
- Some administrators prefer to schedule vacuuming themselves, for example
- doing all the work at night when load is low.
- The difficulty with doing vacuuming according to a fixed schedule
- is that if a table has an unexpected spike in update activity, it may
- get bloated to the point that <command>VACUUM FULL</command> is really necessary
- to reclaim space. Using the autovacuum daemon alleviates this problem,
- since the daemon schedules vacuuming dynamically in response to update
- activity. It is unwise to disable the daemon completely unless you
- have an extremely predictable workload. One possible compromise is
- to set the daemon's parameters so that it will only react to unusually
- heavy update activity, thus keeping things from getting out of hand,
- while scheduled <command>VACUUM</command>s are expected to do the bulk of the
- work when the load is typical.
- </para>
+ <caution>
+ <para>
+ It's important that no long-running transactions ever be allowed
+ to hold back every <command>VACUUM</command> operation's cutoff
+ for an extended period. You may wish to add monitoring to alert
+ on this.
+ </para>
+ </caution>
+
+ <note>
+ <para>
+ <command>VACUUM</command> can remove tuples inserted by aborted
+ transactions immediately
+ </para>
+ </note>
<para>
- For those not using autovacuum, a typical approach is to schedule a
- database-wide <command>VACUUM</command> once a day during a low-usage period,
- supplemented by more frequent vacuuming of heavily-updated tables as
- necessary. (Some installations with extremely high update rates vacuum
- their busiest tables as often as once every few minutes.) If you have
- multiple databases in a cluster, don't forget to
- <command>VACUUM</command> each one; the program <xref
- linkend="app-vacuumdb"/> might be helpful.
+ <command>VACUUM</command> usually won't return space to the
+ operating system. There is one exception: space is returned to the
+ OS whenever a group of contiguous pages appears at the end of a
+ table. <command>VACUUM</command> must acquire an <literal>ACCESS
+ EXCLUSIVE</literal> lock to perform relation truncation. You can
+ disable relation truncation by setting the table's
+ <varname>vacuum_truncate</varname> storage parameter to
+ <literal>off</literal>.
</para>
<tip>
- <para>
- Plain <command>VACUUM</command> may not be satisfactory when
- a table contains large numbers of dead row versions as a result of
- massive update or delete activity. If you have such a table and
- you need to reclaim the excess disk space it occupies, you will need
- to use <command>VACUUM FULL</command>, or alternatively
- <link linkend="sql-cluster"><command>CLUSTER</command></link>
- or one of the table-rewriting variants of
- <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
- These commands rewrite an entire new copy of the table and build
- new indexes for it. All these options require an
- <literal>ACCESS EXCLUSIVE</literal> lock. Note that
- they also temporarily use extra disk space approximately equal to the size
- of the table, since the old copies of the table and indexes can't be
- released until the new ones are complete.
- </para>
+ <para>
+ If you have a table whose entire contents are deleted on a
+ periodic basis, consider doing it with <link
+ linkend="sql-truncate"><command>TRUNCATE</command></link> rather
+ than relying on <command>VACUUM</command>.
+ <command>TRUNCATE</command> removes the entire contents of the
+ table immediately, avoiding the need to set
+ <structfield>xmax</structfield> to the deleting transaction's XID.
+ One disadvantage is that strict MVCC semantics are violated.
+ </para>
</tip>
-
<tip>
- <para>
- If you have a table whose entire contents are deleted on a periodic
- basis, consider doing it with
- <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather
- than using <command>DELETE</command> followed by
- <command>VACUUM</command>. <command>TRUNCATE</command> removes the
- entire content of the table immediately, without requiring a
- subsequent <command>VACUUM</command> or <command>VACUUM
- FULL</command> to reclaim the now-unused disk space.
- The disadvantage is that strict MVCC semantics are violated.
- </para>
+ <para>
+ <command>VACUUM FULL</command> or <command>CLUSTER</command> can
+ be useful when dealing with extreme amounts of dead tuples. It
+ can reclaim more disk space, but runs much more slowly. It
+ rewrites an entire new copy of the table and rebuilds all of the
+ table's indexes. As a result, <command>VACUUM FULL</command> and
+ <command>CLUSTER</command> typically have a much higher overhead
+ than <command>VACUUM</command>. Generally, therefore,
+ administrators should avoid using <command>VACUUM FULL</command>
+ except in the most extreme cases.
+ </para>
</tip>
+ <note>
+ <para>
+ Although <command>VACUUM FULL</command> is technically an option
+ of the <command>VACUUM</command> command, <command>VACUUM
+ FULL</command> uses a completely different implementation.
+ <command>VACUUM FULL</command> is essentially a variant of
+ <command>CLUSTER</command>. (The name <command>VACUUM
+ FULL</command> is historical; the original implementation was
+ somewhat closer to standard <command>VACUUM</command>.)
+ </para>
+ </note>
+ <warning>
+ <para>
+ <command>TRUNCATE</command>, <command>VACUUM FULL</command>, and
+ <command>CLUSTER</command> all require an <literal>ACCESS
+ EXCLUSIVE</literal> lock, which can be highly disruptive
+ (<command>SELECT</command>, <command>INSERT</command>,
+ <command>UPDATE</command>, and <command>DELETE</command> commands
+ won't be able to run at the same time).
+ </para>
+ </warning>
+ <warning>
+ <para>
+ <command>VACUUM FULL</command> and <command>CLUSTER</command>
+ temporarily use extra disk space. The extra space required is
+ approximately equal to the size of the table, since the old
+ copies of the table and indexes can't be released until the new
+ ones are complete.
+ </para>
+ </warning>
</sect2>
<sect2 id="vacuum-for-wraparound">
--
2.40.1
v3-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v3-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload
From 4ed889f2452e832028bc198271b2ba5b7856536c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:44:45 -0700
Subject: [PATCH v3 6/9] Merge "basic vacuuming" sect2 into sect1 introduction.
This doesn't change any of the content itself. It just merges the
original text into the sect1 text that immediately preceded it.
This is preparation for the next commit, which will remove most of the
text "relocated" in this commit. This structure should make things a
little easier for doc translators.
This commit is the last one that could be considered mechanical
restructuring/refactoring of existing text.
---
doc/src/sgml/maintenance.sgml | 106 ++++++++++++++++------------------
1 file changed, 51 insertions(+), 55 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index cb6f28e1e..a05e880fc 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -266,68 +266,64 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
to skim this material to help them understand and adjust autovacuuming.
</para>
- <sect2 id="vacuum-basics">
- <title>Vacuuming Basics</title>
+ <para>
+ <productname>PostgreSQL</productname>'s
+ <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
+ process each table on a regular basis for several reasons:
- <para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ <orderedlist>
+ <listitem>
+ <simpara>To recover or reuse disk space occupied by updated or deleted
+ rows.</simpara>
+ </listitem>
- <orderedlist>
- <listitem>
- <simpara>To recover or reuse disk space occupied by updated or deleted
- rows.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update the visibility map, which speeds
+ up <link linkend="indexes-index-only-scans">index-only
+ scans</link>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To update the visibility map, which speeds
- up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
+ </listitem>
+ </orderedlist>
- <listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
- </listitem>
- </orderedlist>
+ Each of these reasons dictates performing <command>VACUUM</command> operations
+ of varying frequency and scope, as explained in the following subsections.
+ </para>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
- </para>
+ <para>
+ There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
+ and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
+ disk space but runs much more slowly. Also,
+ the standard form of <command>VACUUM</command> can run in parallel with production
+ database operations. (Commands such as <command>SELECT</command>,
+ <command>INSERT</command>, <command>UPDATE</command>, and
+ <command>DELETE</command> will continue to function normally, though you
+ will not be able to modify the definition of a table with commands such as
+ <command>ALTER TABLE</command> while it is being vacuumed.)
+ <command>VACUUM FULL</command> requires an
+ <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
+ working on, and therefore cannot be done in parallel with other use
+ of the table. Generally, therefore,
+ administrators should strive to use standard <command>VACUUM</command> and
+ avoid <command>VACUUM FULL</command>.
+ </para>
- <para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
- </para>
- </sect2>
+ <para>
+ <command>VACUUM</command> creates a substantial amount of I/O
+ traffic, which can cause poor performance for other active sessions.
+ There are configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming — see
+ <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.1
v3-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v3-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload
From 916e6a6121f03b65f361e1ca064fb5d4b6181cee Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:41:00 -0700
Subject: [PATCH v3 5/9] Move Interpreting XID stamps from tuple headers.
This is intended to be fairly close to a mechanical change. It isn't
entirely mechanical, though, since the original wording has been
slightly modified for it to work in context.
Structuring things this way should make life a little easier for doc
translators.
---
doc/src/sgml/maintenance.sgml | 80 ++++------------------
doc/src/sgml/xact.sgml | 125 ++++++++++++++++++++++++++++------
2 files changed, 120 insertions(+), 85 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e130dfdbd..cb6f28e1e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -447,75 +447,25 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<secondary>wraparound</secondary>
</indexterm>
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
- </indexterm>
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs</secondary>
+ </indexterm>
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="mvcc-intro">MVCC</link> transaction semantics
- depend on being able to compare transaction ID (<acronym>XID</acronym>)
- numbers: a row version with an insertion XID greater than the current
- transaction's XID is <quote>in the future</quote> and should not be visible
- to the current transaction. But since transaction IDs have limited size
- (32 bits) a cluster that runs for a long time (more
- than 4 billion transactions) would suffer <firstterm>transaction ID
- wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden
- transactions that were in the past appear to be in the future — which
- means their output become invisible. In short, catastrophic data loss.
- (Actually the data is still there, but that's cold comfort if you cannot
- get at it.) To avoid this, it is necessary to vacuum every table
- in every database at least once every two billion transactions.
+ <productname>PostgreSQL</productname>'s <link
+ linkend="mvcc-intro">MVCC</link> transaction semantics depend on
+ being able to compare <glossterm linkend="glossary-xid">transaction
+ ID numbers (<acronym>XID</acronym>)</glossterm> to determine
+ whether or not the row is visible to each query's MVCC snapshot
+ (see <xref linkend="interpreting-xid-stamps"/>). But since
+ on-disk storage of transaction IDs in heap pages uses a truncated
+ 32-bit representation to save space (rather than the full 64-bit
+ representation), it is necessary to vacuum every table in every
+ database <emphasis>at least</emphasis> once every two billion
+ transactions (though far more frequent vacuuming is typical).
</para>
- <para>
- The reason that periodic vacuuming solves the problem is that
- <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
- they were inserted by a transaction that committed sufficiently far in
- the past that the effects of the inserting transaction are certain to be
- visible to all current and future transactions.
- Normal XIDs are
- compared using modulo-2<superscript>32</superscript> arithmetic. This means
- that for every normal XID, there are two billion XIDs that are
- <quote>older</quote> and two billion that are <quote>newer</quote>; another
- way to say it is that the normal XID space is circular with no
- endpoint. Therefore, once a row version has been created with a particular
- normal XID, the row version will appear to be <quote>in the past</quote> for
- the next two billion transactions, no matter which normal XID we are
- talking about. If the row version still exists after more than two billion
- transactions, it will suddenly appear to be in the future. To
- prevent this, <productname>PostgreSQL</productname> reserves a special XID,
- <literal>FrozenTransactionId</literal>, which does not follow the normal XID
- comparison rules and is always considered older
- than every normal XID.
- Frozen row versions are treated as if the inserting XID were
- <literal>FrozenTransactionId</literal>, so that they will appear to be
- <quote>in the past</quote> to all normal transactions regardless of wraparound
- issues, and so such row versions will be valid until deleted, no matter
- how long that is.
- </para>
-
- <note>
- <para>
- In <productname>PostgreSQL</productname> versions before 9.4, freezing was
- implemented by actually replacing a row's insertion XID
- with <literal>FrozenTransactionId</literal>, which was visible in the
- row's <structname>xmin</structname> system column. Newer versions just set a flag
- bit, preserving the row's original <structname>xmin</structname> for possible
- forensic use. However, rows with <structname>xmin</structname> equal
- to <literal>FrozenTransactionId</literal> (2) may still be found
- in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
- </para>
- <para>
- Also, system catalogs may contain rows with <structname>xmin</structname> equal
- to <literal>BootstrapTransactionId</literal> (1), indicating that they were
- inserted during the first phase of <application>initdb</application>.
- Like <literal>FrozenTransactionId</literal>, this special XID is treated as
- older than every normal XID.
- </para>
- </note>
-
<para>
<xref linkend="guc-vacuum-freeze-min-age"/>
controls how old an XID value has to be before rows bearing that XID will be
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index b467660ee..0762442e1 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -22,6 +22,8 @@
single-statement transactions.
</para>
+ <sect2 id="virtual-xids">
+ <title>Virtual Transaction IDs</title>
<para>
Every transaction is identified by a unique
<literal>VirtualTransactionId</literal> (also called
@@ -46,29 +48,111 @@
started, particularly if the transaction started with statements that
only performed database reads.
</para>
+ </sect2>
- <para>
- The internal transaction ID type <type>xid</type> is 32 bits wide
- and <link linkend="vacuum-for-wraparound">wraps around</link> every
- 4 billion transactions. A 32-bit epoch is incremented during each
- wraparound. There is also a 64-bit type <type>xid8</type> which
- includes this epoch and therefore does not wrap around during the
- life of an installation; it can be converted to xid by casting.
- The functions in <xref linkend="functions-pg-snapshot"/>
- return <type>xid8</type> values. Xids are used as the
- basis for <productname>PostgreSQL</productname>'s <link
- linkend="mvcc">MVCC</link> concurrency mechanism and streaming
- replication.
- </para>
+ <sect2 id="permanent-xids">
+ <title>Permanent Transaction IDs</title>
+ <para>
+ The internal transaction ID type <type>xid</type> is 32 bits wide
+ and wraps around every
+ 4 billion transactions. A 32-bit epoch is incremented during each
+ wraparound. There is also a 64-bit type <type>xid8</type> which
+ includes this epoch and therefore does not wrap around during the
+ life of an installation; it can be converted to xid by casting.
+ The functions in <xref linkend="functions-pg-snapshot"/>
+ return <type>xid8</type> values. Xids are used as the
+ basis for <productname>PostgreSQL</productname>'s <link
+ linkend="mvcc">MVCC</link> concurrency mechanism and streaming
+ replication.
+ </para>
- <para>
- When a top-level transaction with a (non-virtual) xid commits,
- it is marked as committed in the <filename>pg_xact</filename>
- directory. Additional information is recorded in the
- <filename>pg_commit_ts</filename> directory if <xref
- linkend="guc-track-commit-timestamp"/> is enabled.
- </para>
+ <para>
+ When a top-level transaction with a (non-virtual) xid commits,
+ it is marked as committed in the <filename>pg_xact</filename>
+ directory. Additional information is recorded in the
+ <filename>pg_commit_ts</filename> directory if <xref
+ linkend="guc-track-commit-timestamp"/> is enabled.
+ </para>
+ <sect3 id="interpreting-xid-stamps">
+ <title><type>TransactionId</type> comparison rules</title>
+ <para>
+ The system often needs to compare <structfield>t_xmin</structfield>
+ and <structfield>t_xmax</structfield> fields for MVCC snapshot
+ visibility checks.
+ </para>
+
+ <para>
+ We use a truncated 32-bit representation of transaction IDs, rather
+ than using the full 64-bit representation. The 2.1 billion XIDs
+ <quote>distance</quote> invariant must be preserved because
+ transaction IDs stored in heap row headers use a truncated 32-bit
+ representation (rather than the full 64-bit representation). Since
+ all unfrozen transaction IDs from heap tuple headers
+ <emphasis>must</emphasis> be from the same transaction ID epoch (or
+ from a space in the 64-bit representation that spans two adjoining
+ transaction ID epochs), there isn't any need to store a separate
+ epoch field in each tuple header (see <xref
+ linkend="interpreting-xid-stamps"/> for further details). This
+ scheme has the advantage of requiring much less space than a design
+ that stores an XID epoch alongside each XID stored in each heap
+ tuple header. It has the disadvantage of constraining the system's
+ ability to allocate new XIDs (in the worst case scenario where
+ <literal>xidStopLimit</literal> mode is used to preserve the
+ <quote>distance</quote> invariant).
+ </para>
+
+ <para>
+ <command>VACUUM</command> <link linkend="routine-vacuuming">will
+ mark tuple headers <emphasis>frozen</emphasis></link>, indicating
+ that all eligible rows on the page were inserted by a transaction
+ that committed sufficiently far in the past that the effects of the
+ inserting transaction are certain to be visible to all current and
+ future transactions. Normal XIDs are compared using
+ modulo-2<superscript>32</superscript> arithmetic. This means that
+ for every normal XID, there are two billion XIDs that are
+ <quote>older</quote> and two billion that are <quote>newer</quote>;
+ another way to say it is that the normal XID space is circular with
+ no endpoint. Therefore, once a row version has been created with a
+ particular normal XID, the row version will appear to be <quote>in
+ the past</quote> for the next two billion transactions, no matter
+ which normal XID we are talking about. If the row version still
+ exists after more than two billion transactions, it will suddenly
+ appear to be in the future. To prevent this,
+ <productname>PostgreSQL</productname> reserves a special XID,
+ <literal>FrozenTransactionId</literal>, which does not follow the
+ normal XID comparison rules and is always considered older than
+ every normal XID. Frozen row versions are treated as if the
+ inserting XID were <literal>FrozenTransactionId</literal>, so that
+ they will appear to be <quote>in the past</quote> to all normal
+ transactions regardless of wraparound issues, and so such row
+ versions will be valid until deleted, no matter how long that is.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 9.4, freezing was
+ implemented by actually replacing a row's insertion XID
+ with <literal>FrozenTransactionId</literal>, which was visible in the
+ row's <structname>xmin</structname> system column. Newer versions just set a flag
+ bit, preserving the row's original <structname>xmin</structname> for possible
+ forensic use. However, rows with <structname>xmin</structname> equal
+ to <literal>FrozenTransactionId</literal> (2) may still be found
+ in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
+ </para>
+ <para>
+ Also, system catalogs may contain rows with <structname>xmin</structname> equal
+ to <literal>BootstrapTransactionId</literal> (1), indicating that they were
+ inserted during the first phase of <application>initdb</application>.
+ Like <literal>FrozenTransactionId</literal>, this special XID is treated as
+ older than every normal XID.
+ </para>
+ </note>
+ </sect3>
+ </sect2>
+
+ <sect2 id="global-transaction-ids">
+ <title>Global Transaction Identifiers</title>
<para>
In addition to <literal>vxid</literal> and <type>xid</type>,
prepared transactions are also assigned Global Transaction
@@ -77,6 +161,7 @@
prepared transactions. The mapping of GID to xid is shown in <link
linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.
</para>
+ </sect2>
</sect1>
<sect1 id="xact-locking">
--
2.40.1
v3-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v3-0004-Reorder-routine-vacuuming-sections.patchDownload
From 6f218d033bb334b5dc42e78cbcf7b33abeafa4f8 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:19:50 -0700
Subject: [PATCH v3 4/9] Reorder routine vacuuming sections.
This doesn't change any of the content itself. It is a mechanical
change. The new order flows better because it talks about freezing
directly after talking about space recovery tasks.
Old order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-statistics">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-wraparound">
New order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-wraparound">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-statistics">
The new order matches processing order inside vacuumlazy.c. This order
will be easier to work with in two later commits that more or less
rewrite "vacuum-for-wraparound" and "vacuum-for-space-recovery".
(Though it doesn't seem to make the existing content any less meaningful
without the later rewrite commits.)
---
doc/src/sgml/maintenance.sgml | 302 +++++++++++++++++-----------------
1 file changed, 151 insertions(+), 151 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e8c8647cd..e130dfdbd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -281,8 +281,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
</listitem>
<listitem>
@@ -292,9 +293,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
</listitem>
</orderedlist>
@@ -439,151 +439,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-statistics">
- <title>Updating Planner Statistics</title>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>statistics</primary>
- <secondary>of the planner</secondary>
- </indexterm>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>ANALYZE</primary>
- </indexterm>
-
- <para>
- The <productname>PostgreSQL</productname> query planner relies on
- statistical information about the contents of tables in order to
- generate good plans for queries. These statistics are gathered by
- the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
- which can be invoked by itself or
- as an optional step in <command>VACUUM</command>. It is important to have
- reasonably accurate statistics, otherwise poor choices of plans might
- degrade database performance.
- </para>
-
- <para>
- The autovacuum daemon, if enabled, will automatically issue
- <command>ANALYZE</command> commands whenever the content of a table has
- changed sufficiently. However, administrators might prefer to rely
- on manually-scheduled <command>ANALYZE</command> operations, particularly
- if it is known that update activity on a table will not affect the
- statistics of <quote>interesting</quote> columns. The daemon schedules
- <command>ANALYZE</command> strictly as a function of the number of rows
- inserted or updated; it has no knowledge of whether that will lead
- to meaningful statistical changes.
- </para>
-
- <para>
- Tuples changed in partitions and inheritance children do not trigger
- analyze on the parent table. If the parent table is empty or rarely
- changed, it may never be processed by autovacuum, and the statistics for
- the inheritance tree as a whole won't be collected. It is necessary to
- run <command>ANALYZE</command> on the parent table manually in order to
- keep the statistics up to date.
- </para>
-
- <para>
- As with vacuuming for space recovery, frequent updates of statistics
- are more useful for heavily-updated tables than for seldom-updated
- ones. But even for a heavily-updated table, there might be no need for
- statistics updates if the statistical distribution of the data is
- not changing much. A simple rule of thumb is to think about how much
- the minimum and maximum values of the columns in the table change.
- For example, a <type>timestamp</type> column that contains the time
- of row update will have a constantly-increasing maximum value as
- rows are added and updated; such a column will probably need more
- frequent statistics updates than, say, a column containing URLs for
- pages accessed on a website. The URL column might receive changes just
- as often, but the statistical distribution of its values probably
- changes relatively slowly.
- </para>
-
- <para>
- It is possible to run <command>ANALYZE</command> on specific tables and even
- just specific columns of a table, so the flexibility exists to update some
- statistics more frequently than others if your application requires it.
- In practice, however, it is usually best to just analyze the entire
- database, because it is a fast operation. <command>ANALYZE</command> uses a
- statistically random sampling of the rows of a table rather than reading
- every single row.
- </para>
-
- <tip>
- <para>
- Although per-column tweaking of <command>ANALYZE</command> frequency might not be
- very productive, you might find it worthwhile to do per-column
- adjustment of the level of detail of the statistics collected by
- <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
- clauses and have highly irregular data distributions might require a
- finer-grain data histogram than other columns. See <command>ALTER TABLE
- SET STATISTICS</command>, or change the database-wide default using the <xref
- linkend="guc-default-statistics-target"/> configuration parameter.
- </para>
-
- <para>
- Also, by default there is limited information available about
- the selectivity of functions. However, if you create a statistics
- object or an expression
- index that uses a function call, useful statistics will be
- gathered about the function, which can greatly improve query
- plans that use the expression index.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands for
- foreign tables, since it has no means of determining how often that
- might be useful. If your queries require statistics on foreign tables
- for proper planning, it's a good idea to run manually-managed
- <command>ANALYZE</command> commands on those tables on a suitable schedule.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands
- for partitioned tables. Inheritance parents will only be analyzed if the
- parent itself is changed - changes to child tables do not trigger
- autoanalyze on the parent table. If your queries require statistics on
- parent tables for proper planning, it is necessary to periodically run
- a manual <command>ANALYZE</command> on those tables to keep the statistics
- up to date.
- </para>
- </tip>
-
- </sect2>
-
- <sect2 id="vacuum-for-visibility-map">
- <title>Updating the Visibility Map</title>
-
- <para>
- Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
- table to keep track of which pages contain only tuples that are known to be
- visible to all active transactions (and all future transactions, until the
- page is again modified). This has two purposes. First, vacuum
- itself can skip such pages on the next run, since there is nothing to
- clean up.
- </para>
-
- <para>
- Second, it allows <productname>PostgreSQL</productname> to answer some
- queries using only the index, without reference to the underlying table.
- Since <productname>PostgreSQL</productname> indexes don't contain tuple
- visibility information, a normal index scan fetches the heap tuple for each
- matching index entry, to check whether it should be seen by the current
- transaction.
- An <link linkend="indexes-index-only-scans"><firstterm>index-only
- scan</firstterm></link>, on the other hand, checks the visibility map first.
- If it's known that all tuples on the page are
- visible, the heap fetch can be skipped. This is most useful on
- large data sets where the visibility map can prevent disk accesses.
- The visibility map is vastly smaller than the heap, so it can easily be
- cached even when the heap is very large.
- </para>
- </sect2>
-
<sect2 id="vacuum-for-wraparound">
<title>Preventing Transaction ID Wraparound Failures</title>
@@ -933,7 +788,152 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
- </sect1>
+
+ <sect2 id="vacuum-for-visibility-map">
+ <title>Updating the Visibility Map</title>
+
+ <para>
+ Vacuum maintains a <link linkend="storage-vm">visibility
+ map</link> for each table to keep track of which pages contain
+ only tuples that are known to be visible to all active
+ transactions (and all future transactions, until the page is again
+ modified). This has two purposes. First, vacuum itself can skip
+ such pages on the next run, since there is nothing to clean up.
+ </para>
+
+ <para>
+ Second, it allows <productname>PostgreSQL</productname> to answer
+ some queries using only the index, without reference to the
+ underlying table. Since <productname>PostgreSQL</productname>
+ indexes don't contain tuple visibility information, a normal index
+ scan fetches the heap tuple for each matching index entry, to
+ check whether it should be seen by the current transaction. An
+ <link linkend="indexes-index-only-scans"><firstterm>index-only
+ scan</firstterm></link>, on the other hand, checks the
+ visibility map first. If it's known that all tuples on the page
+ are visible, the heap fetch can be skipped. This is most useful
+ on large data sets where the visibility map can prevent disk
+ accesses. The visibility map is vastly smaller than the heap, so
+ it can easily be cached even when the heap is very large.
+ </para>
+ </sect2>
+
+ <sect2 id="vacuum-for-statistics">
+ <title>Updating Planner Statistics</title>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>statistics</primary>
+ <secondary>of the planner</secondary>
+ </indexterm>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>ANALYZE</primary>
+ </indexterm>
+
+ <para>
+ The <productname>PostgreSQL</productname> query planner relies on
+ statistical information about the contents of tables in order to
+ generate good plans for queries. These statistics are gathered by
+ the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
+ which can be invoked by itself or
+ as an optional step in <command>VACUUM</command>. It is important to have
+ reasonably accurate statistics, otherwise poor choices of plans might
+ degrade database performance.
+ </para>
+
+ <para>
+ The autovacuum daemon, if enabled, will automatically issue
+ <command>ANALYZE</command> commands whenever the content of a table has
+ changed sufficiently. However, administrators might prefer to rely
+ on manually-scheduled <command>ANALYZE</command> operations, particularly
+ if it is known that update activity on a table will not affect the
+ statistics of <quote>interesting</quote> columns. The daemon schedules
+ <command>ANALYZE</command> strictly as a function of the number of rows
+ inserted or updated; it has no knowledge of whether that will lead
+ to meaningful statistical changes.
+ </para>
+
+ <para>
+ Tuples changed in partitions and inheritance children do not trigger
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum, and the statistics for
+ the inheritance tree as a whole won't be collected. It is necessary to
+ run <command>ANALYZE</command> on the parent table manually in order to
+ keep the statistics up to date.
+ </para>
+
+ <para>
+ As with vacuuming for space recovery, frequent updates of statistics
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there might be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
+ the minimum and maximum values of the columns in the table change.
+ For example, a <type>timestamp</type> column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column might receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
+ </para>
+
+ <para>
+ It is possible to run <command>ANALYZE</command> on specific tables and even
+ just specific columns of a table, so the flexibility exists to update some
+ statistics more frequently than others if your application requires it.
+ In practice, however, it is usually best to just analyze the entire
+ database, because it is a fast operation. <command>ANALYZE</command> uses a
+ statistically random sampling of the rows of a table rather than reading
+ every single row.
+ </para>
+
+ <tip>
+ <para>
+ Although per-column tweaking of <command>ANALYZE</command> frequency might not be
+ very productive, you might find it worthwhile to do per-column
+ adjustment of the level of detail of the statistics collected by
+ <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
+ clauses and have highly irregular data distributions might require a
+ finer-grain data histogram than other columns. See <command>ALTER TABLE
+ SET STATISTICS</command>, or change the database-wide default using the <xref
+ linkend="guc-default-statistics-target"/> configuration parameter.
+ </para>
+
+ <para>
+ Also, by default there is limited information available about
+ the selectivity of functions. However, if you create a statistics
+ object or an expression
+ index that uses a function call, useful statistics will be
+ gathered about the function, which can greatly improve query
+ plans that use the expression index.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ foreign tables, since it has no means of determining how often that
+ might be useful. If your queries require statistics on foreign tables
+ for proper planning, it's a good idea to run manually-managed
+ <command>ANALYZE</command> commands on those tables on a suitable schedule.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands
+ for partitioned tables. Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table. If your queries require statistics on
+ parent tables for proper planning, it is necessary to periodically run
+ a manual <command>ANALYZE</command> on those tables to keep the statistics
+ up to date.
+ </para>
+ </tip>
+
+ </sect2>
+</sect1>
<sect1 id="routine-reindex">
--
2.40.1
v3-0003-Normalize-maintenance.sgml-indentation.patchapplication/octet-stream; name=v3-0003-Normalize-maintenance.sgml-indentation.patchDownload
From 93fc0893e4ca17068737d14e4a9901c83b264e99 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 15:20:13 -0700
Subject: [PATCH v3 3/9] Normalize maintenance.sgml indentation.
---
doc/src/sgml/maintenance.sgml | 82 +++++++++++++++++------------------
1 file changed, 41 insertions(+), 41 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 6a7ec7c1d..e8c8647cd 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -11,53 +11,53 @@
<primary>routine maintenance</primary>
</indexterm>
- <para>
- <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
- be performed regularly to achieve optimum performance. The tasks
- discussed here are <emphasis>required</emphasis>, but they
- are repetitive in nature and can easily be automated using standard
- tools such as <application>cron</application> scripts or
- Windows' <application>Task Scheduler</application>. It is the database
- administrator's responsibility to set up appropriate scripts, and to
- check that they execute successfully.
- </para>
+ <para>
+ <productname>PostgreSQL</productname>, like any database software, requires that certain tasks
+ be performed regularly to achieve optimum performance. The tasks
+ discussed here are <emphasis>required</emphasis>, but they
+ are repetitive in nature and can easily be automated using standard
+ tools such as <application>cron</application> scripts or
+ Windows' <application>Task Scheduler</application>. It is the database
+ administrator's responsibility to set up appropriate scripts, and to
+ check that they execute successfully.
+ </para>
- <para>
- One obvious maintenance task is the creation of backup copies of the data on a
- regular schedule. Without a recent backup, you have no chance of recovery
- after a catastrophe (disk failure, fire, mistakenly dropping a critical
- table, etc.). The backup and recovery mechanisms available in
- <productname>PostgreSQL</productname> are discussed at length in
- <xref linkend="backup"/>.
- </para>
+ <para>
+ One obvious maintenance task is the creation of backup copies of the data on a
+ regular schedule. Without a recent backup, you have no chance of recovery
+ after a catastrophe (disk failure, fire, mistakenly dropping a critical
+ table, etc.). The backup and recovery mechanisms available in
+ <productname>PostgreSQL</productname> are discussed at length in
+ <xref linkend="backup"/>.
+ </para>
- <para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
- </para>
+ <para>
+ The other main category of maintenance task is periodic <quote>vacuuming</quote>
+ of the database. This activity is discussed in
+ <xref linkend="routine-vacuuming"/>. Closely related to this is updating
+ the statistics that will be used by the query planner, as discussed in
+ <xref linkend="vacuum-for-statistics"/>.
+ </para>
- <para>
- Another task that might need periodic attention is log file management.
- This is discussed in <xref linkend="logfile-maintenance"/>.
- </para>
+ <para>
+ Another task that might need periodic attention is log file management.
+ This is discussed in <xref linkend="logfile-maintenance"/>.
+ </para>
- <para>
- <ulink
+ <para>
+ <ulink
url="https://bucardo.org/check_postgres/"><application>check_postgres</application></ulink>
- is available for monitoring database health and reporting unusual
- conditions. <application>check_postgres</application> integrates with
- Nagios and MRTG, but can be run standalone too.
- </para>
+ is available for monitoring database health and reporting unusual
+ conditions. <application>check_postgres</application> integrates with
+ Nagios and MRTG, but can be run standalone too.
+ </para>
- <para>
- <productname>PostgreSQL</productname> is low-maintenance compared
- to some other database management systems. Nonetheless,
- appropriate attention to these tasks will go far towards ensuring a
- pleasant and productive experience with the system.
- </para>
+ <para>
+ <productname>PostgreSQL</productname> is low-maintenance compared
+ to some other database management systems. Nonetheless,
+ appropriate attention to these tasks will go far towards ensuring a
+ pleasant and productive experience with the system.
+ </para>
<sect1 id="autovacuum">
<title>The Autovacuum Daemon</title>
--
2.40.1
v3-0002-Restructure-autovacuum-daemon-section.patchapplication/octet-stream; name=v3-0002-Restructure-autovacuum-daemon-section.patchDownload
From b73a948bf0723ce424cf75ba23694e78c4c3aefd Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 24 Apr 2023 09:21:01 -0700
Subject: [PATCH v3 2/9] Restructure autovacuum daemon section.
Add sect2/sect3 subsections to autovacuum sect1. Also reorder the
content slightly for clarity.
TODO Add some basic explanations of vacuuming and relfrozenxid
advancement, since that now appears later on in the chapter.
Alternatively, move the autovacuum daemon sect1 after the routine
vacuuming sect1.
---
doc/src/sgml/maintenance.sgml | 66 ++++++++++++++++++++++-------------
1 file changed, 42 insertions(+), 24 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index a6295c399..6a7ec7c1d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -100,6 +100,8 @@
autovacuum workers' activity.
</para>
+ <sect2 id="autovacuum-scheduling">
+ <title>Autovacuum Scheduling</title>
<para>
If several large tables all become eligible for vacuuming in a short
amount of time, all autovacuum workers might become occupied with
@@ -112,6 +114,8 @@
<xref linkend="guc-superuser-reserved-connections"/> limits.
</para>
+ <sect3 id="autovacuum-vacuum-thresholds">
+ <title>Configurable thresholds for vacuuming</title>
<para>
Tables whose <structfield>relfrozenxid</structfield> value is more than
<xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
@@ -159,7 +163,10 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
since the last vacuum are scanned.
</para>
+ </sect3>
+ <sect3 id="autovacuum-analyze-thresholds">
+ <title>Configurable thresholds for <command>ANALYZE</command></title>
<para>
For analyze, a similar condition is used: the threshold, defined as:
<programlisting>
@@ -168,20 +175,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
</para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
<para>
The default thresholds and scale factors are taken from
<filename>postgresql.conf</filename>, but it is possible to override them
@@ -192,18 +185,25 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
used. See <xref linkend="runtime-config-autovacuum"/> for more details on
the global settings.
</para>
+ </sect3>
+ </sect2>
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
+ <sect2 id="autovacuum-cost-delays">
+ <title>Autovacuum Cost-based Delays</title>
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+ </sect2>
+ <sect2 id="autovacuum-lock-conflicts">
+ <title>Autovacuum and Lock Conflicts</title>
<para>
Autovacuum workers generally don't block other commands. If a process
attempts to acquire a lock that conflicts with the
@@ -223,6 +223,24 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
effectively prevent autovacuums from ever completing.
</para>
</warning>
+ </sect2>
+
+ <sect2 id="autovacuum-limitations">
+ <title>Limitations</title>
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+ </sect2>
+
</sect1>
<sect1 id="routine-vacuuming">
--
2.40.1
On Wed, 3 May 2023 at 18:50, Peter Geoghegan <pg@bowt.ie> wrote:
What about "XID allocation overload"? The implication that I'm going
for here is that the system was misconfigured, or there was otherwise
some kind of imbalance between XID supply and demand.
Fwiw while "wraparound" has pitfalls I think changing it for a new
word isn't really helpful. Especially if it's a mostly meaningless
word like "overload" or "exhaustion". It suddenly makes every existing
doc hard to find and confusing to read.
I say "exhaustion" or "overload" are meaningless because their meaning
is entirely dependent on context. It's not like memory exhaustion or
i/o overload where it's a finite resource and it's just the sheer
amount in use that matters. One way or another the user needs to
understand that it's two numbers marching through a sequence and the
distance between them matters.
I feel like "wraparound" while imperfect is not any worse than any
other word. It still requires context to understand but it's context
that there are many docs online that already explain and are
googleable.
If we wanted a new word it would be "overrun" but like I say, it would
just create a new context dependent technical term that users would
need to find docs that explain and give context. I don't think that
really helps users at all
--
greg
On Thu, May 11, 2023 at 1:04 PM Greg Stark <stark@mit.edu> wrote:
Fwiw while "wraparound" has pitfalls I think changing it for a new
word isn't really helpful. Especially if it's a mostly meaningless
word like "overload" or "exhaustion". It suddenly makes every existing
doc hard to find and confusing to read.
Just to be clear, I am not proposing changing the name of
anti-wraparound autovacuum at all. What I'd like to do is use a term
like "XID exhaustion" to refer to the state that we internally refer
to as xidStopLimit. My motivation is simple: we've completely
terrified users by emphasizing wraparound, which is something that is
explicitly and prominently presented as a variety of data corruption.
The docs say this:
"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future — which means their output become invisible. In short,
catastrophic data loss."
I say "exhaustion" or "overload" are meaningless because their meaning
is entirely dependent on context. It's not like memory exhaustion or
i/o overload where it's a finite resource and it's just the sheer
amount in use that matters.
But transaction IDs are a finite resource, in the sense that you can
never have more than about 2.1 billion distinct unfrozen XIDs at any
one time. "Transaction ID exhaustion" is therefore a lot more
descriptive of the underlying problem. It's a lot better than
wraparound, which, as I've said, is inaccurate in two major ways:
1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.
2. Most physical integer wraparound is harmless and perfectly routine.
But even this is fairly secondary to me. I don't actually think it's
that important that the name describe exactly what's going on here --
that's expecting rather a lot from a name. That's not really the goal.
The goal is to undo the damage of documentation that heavily implies
that data corruption is the eventual result of not doing enough
vacuuming, in its basic introductory remarks to freezing stuff.
Like Samay, my consistent experience (particularly back in my Heroku
days) has been that people imagine that data corruption would happen
when the system reached what we'd call xidStopLimit. Can you blame
them for thinking that? Almost any name for xidStopLimit that doesn't
have that historical baggage seems likely to be a vast improvement.
--
Peter Geoghegan
On Thu, May 11, 2023 at 1:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
Just to be clear, I am not proposing changing the name of
anti-wraparound autovacuum at all. What I'd like to do is use a term
like "XID exhaustion" to refer to the state that we internally refer
to as xidStopLimit. My motivation is simple: we've completely
terrified users by emphasizing wraparound, which is something that is
explicitly and prominently presented as a variety of data corruption.
The docs say this:"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future — which means their output become invisible. In short,
catastrophic data loss."
Notice that this says that "catastrophic data loss" occurs when "the
XID counter wraps around to zero". I think that this was how it worked
before the invention of freezing, over 20 years ago -- the last time
the system would allocate about 4 billion XIDs without doing any
freezing.
While it is still possible to corrupt the database in single user
mode, it has precisely nothing to do with the point that "the XID
counter wraps around to zero". I believe that this wording has done
not insignificant damage to the project's reputation. But let's assume
for a moment that there's only a tiny chance that I'm right about all
of this -- let's assume I'm probably just being alarmist about how
this has been received in the wider world. Even then: why take even a
small chance?
--
Peter Geoghegan
On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatable XIDs"?
I use the term "transaction ID exhaustion" in the attached revision,
v4. Overall, v4 builds on the work that went into v2 and v3, by
continuing to polish the overhaul of everything related to freezing,
relfrozenxid advancement, and anti-wraparound autovacuum.
It would be nice if it was possible to add an animation/diagram a
little like this one: https://tuple-freezing-demo.angusd.com (this is
how I tend to think about the "transaction ID space".)
I feel that the patch that deals with freezing is really coming
together in v4. The main problem now is lack of detailed review --
though the freezing related patch is still not committable, it's
getting close now. (The changes to the docs covering freezing should
be committed separately from any further work on "25.2.1. Recovering
Disk Space". I still haven't done much there in v4, and those parts
clearly aren't anywhere near being committable. So, for now, they can
mostly be ignored.)
v4 also limits use of the term "wraparound" to places that directly
discuss anti-wraparound autovacuums (plus one place in xact.sgml,
where discussion of "true unsigned integer wraparound" and related
implementation details has been moved). Otherwise we use the term
"transaction ID exhaustion", which is pretty much the user-facing name
for "xidStopLimit". I feel that this is a huge improvement, for the
reason given to Greg earlier. I'm flexible on the details, but I feel
strongly that we should minimize use of the term wraparound wherever
it might have the connotation of "the past becoming the future". This
is not a case of inventing a new terminology for its own sake. If
anybody is skeptical I ask that they take a look at what I came up
with before declaring it a bad idea. I have made that as easy as
possible, by once again attaching a prebuilt routine-vacuuming.html.
I no longer believe that committing this patch series needs to block
on the patch that seeks to put things straight with single user mode
and xidStopLimit/transaction ID exhaustion (the one that John Naylor
is currently working on getting in shape), either (I'll explain my
reasoning if somebody wants to hear it).
Other changes in v4, compared to v3:
* Improved discussion of the differences between non-aggressive and
aggressive VACUUM.
Now mentions the issue of aggressive VACUUMs waiting for a cleanup
lock, including mention of the BufferPin wait event. This is the
second, minor difference between each kind of VACUUM. It matters much
less than the first difference, but it does merit a mention.
The discussion of aggressive VACUUM seems to be best approached by
starting with the mechanical differences, and only later going into
the consequences of those differences. (Particularly catch-up
freezing.)
* Explains "catch-up freezing" performed by aggressive VACUUMs directly.
"Catch-up" freezing is the really important "consequence" -- something
that emerges from how each type of VACUUM behaves over time. It is an
indirect consequence of the behaviors. I would like to counter the
perception that some users have about freezing only happening during
aggressive VACUUMs (or anti-wraparound autovacuums). But more than
that, talking about catch-up freezing seems essential because it is
the single most important difference.
* Much improved handling of the discussion of anti-wraparound
autovacuum, and how it relates to aggressive VACUUMs, following
feedback from Samay.
There is now only fairly minimal overlap in the discussion of
aggressive VACUUM and anti-wraparound autovacuuming. We finish the
discussion of aggressive VACUUM just after we start discussing
anti-wraparound autovacuum. This transition works well, because it
enforces the idea that anti-wraparound autovacuum isn't really special
compared to any other aggressive autovacuum. This was something that
Samay expressed particularly concern about: making anti-wraparound
autovacuums sound less scary. Though it's also a concern I had from
the outset, based on practical experience and interactions with people
that have much less knowledge of Postgres than I do.
* Anti-wraparound autovacuum is now mostly discussed as something that
happens to static or mostly-static tables.
This is related to the goal of making anti-wraparound autovacuums
sound less scary. Larger tables don't necessarily require any
anti-wraparound autovacuums these days -- we have the insert-driven
autovacuum trigger condition these days, so it's plausible (perhaps
even likely) that all aggressive VACUUMs against the largest
append-only tables can happen when autovacuum triggers VACUUMs to
processed recently inserted tuples.
This moves discussion of anti-wraparound av in the direction of:
"Anti-wraparound autovacuum is a special type of autovacuum. Its
purpose is to ensure that relfrozenxid advances when no earlier VACUUM
could advance it in passing — often because no VACUUM has run against
the table for an extended period."
* Added a couple of "Tips" about instrumentation that appears in the
server log whenever autovacuum reports on a VACUUM operation.
* Much improved "Truncating Transaction Status Information" subsection.
My explanation of the ways in which autovacuum_freeze_max_age can
affect the storage overhead of commit/abort status in pg_xact is much
clearer than it was in v3 -- pg_xact truncation is now treated as
something loosely related to the global config of anti-wraparound
autovacuum, which makes most sense.
It took a great deal of effort to find a structure that covered
everything, and that highlighted all of the important relationships
without going too far, while at the same time not being a huge mess.
That's what I feel I've arrived at with v4.
--
Peter Geoghegan
Attachments:
v4-0003-Reindent-autovacuum-daemon-sect1.patchapplication/octet-stream; name=v4-0003-Reindent-autovacuum-daemon-sect1.patchDownload
From 3d246ace93f13594b8525b3516f35560f9721599 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 6 May 2023 12:32:43 -0700
Subject: [PATCH v4 3/9] Reindent autovacuum daemon sect1.
When "The Autovacuum Daemon" became its own sect1, the sect1 tags were
left indented incorrectly (for a sect1) in order to make the initial
"move text into its own sect1" commit as mechanical as possible (we kept
the original sect2 indentation). A later commit further split up "The
Autovacuum Daemon" into further sect2 and sect3 subsections, to add
useful headings. Now we actually fix the indentation changes that were
put off at first.
It may look like these indentation changes are wrong (they look wrong
relative to the chapter-level introductory content that precedes these
changes to indentation), but it's actually the other way around: there
is a preexisting problem with the indentation for the chapter-level
introductory material (which I have opted to not fix now).
Note that the newly added "The Autovacuum Daemon" sect1 now has one
space of indentation at the top level, just like every other sect1.
As usual, the goal of structuring things this way is to make life easier
for translators.
---
doc/src/sgml/maintenance.sgml | 198 +++++++++++++++++-----------------
1 file changed, 99 insertions(+), 99 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index c27efc58d..702e2797c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -59,46 +59,46 @@
pleasant and productive experience with the system.
</para>
- <sect1 id="autovacuum">
- <title>The Autovacuum Daemon</title>
+ <sect1 id="autovacuum">
+ <title>The Autovacuum Daemon</title>
- <indexterm>
- <primary>autovacuum</primary>
- <secondary>general information</secondary>
- </indexterm>
- <para>
- <productname>PostgreSQL</productname> has an optional but highly
- recommended feature called <firstterm>autovacuum</firstterm>,
- whose purpose is to automate the execution of
- <command>VACUUM</command> and <command>ANALYZE</command> commands.
- When enabled, autovacuum checks for
- tables that have had a large number of inserted, updated or deleted
- tuples. These checks use the statistics collection facility;
- therefore, autovacuum cannot be used unless <xref
+ <indexterm>
+ <primary>autovacuum</primary>
+ <secondary>general information</secondary>
+ </indexterm>
+ <para>
+ <productname>PostgreSQL</productname> has an optional but highly
+ recommended feature called <firstterm>autovacuum</firstterm>,
+ whose purpose is to automate the execution of
+ <command>VACUUM</command> and <command>ANALYZE</command> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <xref
linkend="guc-track-counts"/> is set to <literal>true</literal>.
- In the default configuration, autovacuuming is enabled and the related
- configuration parameters are appropriately set.
- </para>
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </para>
- <para>
- The <quote>autovacuum daemon</quote> actually consists of multiple processes.
- There is a persistent daemon process, called the
- <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
- <firstterm>autovacuum worker</firstterm> processes for all databases. The
- launcher will distribute the work across time, attempting to start one
- worker within each database every <xref linkend="guc-autovacuum-naptime"/>
- seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
- a new worker will be launched every
- <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
- A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
- are allowed to run at the same time. If there are more than
- <varname>autovacuum_max_workers</varname> databases to be processed,
- the next database will be processed as soon as the first worker finishes.
- Each worker process will check each table within its database and
- execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
- <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
- autovacuum workers' activity.
- </para>
+ <para>
+ The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+ <firstterm>autovacuum worker</firstterm> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+ seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
+ a new worker will be launched every
+ <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+ A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+ are allowed to run at the same time. If there are more than
+ <varname>autovacuum_max_workers</varname> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+ <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+ autovacuum workers' activity.
+ </para>
<sect2 id="autovacuum-scheduling">
<title>Autovacuum Scheduling</title>
@@ -114,78 +114,78 @@
<xref linkend="guc-superuser-reserved-connections"/> limits.
</para>
- <sect3 id="autovacuum-vacuum-thresholds">
- <title>Configurable thresholds for vacuuming</title>
- <para>
- Tables whose <structfield>relfrozenxid</structfield> value is more than
- <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
- vacuumed (this also applies to those tables whose freeze max age has
- been modified via storage parameters; see below). Otherwise, if the
- number of tuples obsoleted since the last
- <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
- table is vacuumed. The vacuum threshold is defined as:
+ <sect3 id="autovacuum-vacuum-thresholds">
+ <title>Configurable thresholds for vacuuming</title>
+ <para>
+ Tables whose <structfield>relfrozenxid</structfield> value is more than
+ <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+ table is vacuumed. The vacuum threshold is defined as:
<programlisting>
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
</programlisting>
- where the vacuum base threshold is
- <xref linkend="guc-autovacuum-vacuum-threshold"/>,
- the vacuum scale factor is
- <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
- and the number of tuples is
- <structname>pg_class</structname>.<structfield>reltuples</structfield>.
- </para>
+ where the vacuum base threshold is
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ the vacuum scale factor is
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+ and the number of tuples is
+ <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+ </para>
- <para>
- The table is also vacuumed if the number of tuples inserted since the last
- vacuum has exceeded the defined insert threshold, which is defined as:
+ <para>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
<programlisting>
vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
</programlisting>
- where the vacuum insert base threshold is
- <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
- and vacuum insert scale factor is
- <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
- Such vacuums may allow portions of the table to be marked as
- <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
- it is a semi-accurate count updated by each <command>UPDATE</command>,
- <command>DELETE</command> and <command>INSERT</command> operation. (It is
- only semi-accurate because some information might be lost under heavy
- load.) If the <structfield>relfrozenxid</structfield> value of the table
- is more than <varname>vacuum_freeze_table_age</varname> transactions old,
- an aggressive vacuum is performed to freeze old tuples and advance
- <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
- since the last vacuum are scanned.
- </para>
- </sect3>
+ where the vacuum insert base threshold is
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+ and vacuum insert scale factor is
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+ Such vacuums may allow portions of the table to be marked as
+ <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <command>INSERT</command> operations but no or
+ almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+ it may be beneficial to lower the table's
+ <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the cumulative statistics system;
+ it is a semi-accurate count updated by each <command>UPDATE</command>,
+ <command>DELETE</command> and <command>INSERT</command> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <structfield>relfrozenxid</structfield> value of the table
+ is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </para>
+ </sect3>
- <sect3 id="autovacuum-analyze-thresholds">
- <title>Configurable thresholds for <command>ANALYZE</command></title>
- <para>
- For analyze, a similar condition is used: the threshold, defined as:
+ <sect3 id="autovacuum-analyze-thresholds">
+ <title>Configurable thresholds for <command>ANALYZE</command></title>
+ <para>
+ For analyze, a similar condition is used: the threshold, defined as:
<programlisting>
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
</programlisting>
- is compared to the total number of tuples inserted, updated, or deleted
- since the last <command>ANALYZE</command>.
- </para>
- <para>
- The default thresholds and scale factors are taken from
- <filename>postgresql.conf</filename>, but it is possible to override them
- (and many other autovacuum control parameters) on a per-table basis; see
- <xref linkend="sql-createtable-storage-parameters"/> for more information.
- If a setting has been changed via a table's storage parameters, that value
- is used when processing that table; otherwise the global settings are
- used. See <xref linkend="runtime-config-autovacuum"/> for more details on
- the global settings.
- </para>
- </sect3>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <command>ANALYZE</command>.
+ </para>
+ <para>
+ The default thresholds and scale factors are taken from
+ <filename>postgresql.conf</filename>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <xref linkend="sql-createtable-storage-parameters"/> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+ the global settings.
+ </para>
+ </sect3>
</sect2>
<sect2 id="autovacuum-cost-delays">
@@ -240,7 +240,7 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
session SQL commands.
</para>
</sect2>
- </sect1>
+ </sect1>
<sect1 id="routine-vacuuming">
<title>Routine Vacuuming</title>
--
2.40.1
v4-0002-Restructure-autovacuum-daemon-section.patchapplication/octet-stream; name=v4-0002-Restructure-autovacuum-daemon-section.patchDownload
From b41c88098364b10420bc5499cc8775f45406742c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 24 Apr 2023 09:21:01 -0700
Subject: [PATCH v4 2/9] Restructure autovacuum daemon section.
Add sect2/sect3 subsections to autovacuum sect1. Also reorder the
content slightly for clarity by consolidating "limitations".
The next commit finishes recent changes to the autovacuum daemon content
by reindenting everything.
TODO Add some basic explanations of vacuuming and relfrozenxid
advancement, since that now appears later on in the chapter.
Alternatively, move the autovacuum daemon sect1 after the routine
vacuuming sect1.
---
doc/src/sgml/maintenance.sgml | 45 ++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index b9091e72c..c27efc58d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -100,6 +100,8 @@
autovacuum workers' activity.
</para>
+ <sect2 id="autovacuum-scheduling">
+ <title>Autovacuum Scheduling</title>
<para>
If several large tables all become eligible for vacuuming in a short
amount of time, all autovacuum workers might become occupied with
@@ -112,6 +114,8 @@
<xref linkend="guc-superuser-reserved-connections"/> limits.
</para>
+ <sect3 id="autovacuum-vacuum-thresholds">
+ <title>Configurable thresholds for vacuuming</title>
<para>
Tables whose <structfield>relfrozenxid</structfield> value is more than
<xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
@@ -159,7 +163,10 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
since the last vacuum are scanned.
</para>
+ </sect3>
+ <sect3 id="autovacuum-analyze-thresholds">
+ <title>Configurable thresholds for <command>ANALYZE</command></title>
<para>
For analyze, a similar condition is used: the threshold, defined as:
<programlisting>
@@ -168,20 +175,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
</para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
<para>
The default thresholds and scale factors are taken from
<filename>postgresql.conf</filename>, but it is possible to override them
@@ -192,7 +185,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
used. See <xref linkend="runtime-config-autovacuum"/> for more details on
the global settings.
</para>
+ </sect3>
+ </sect2>
+ <sect2 id="autovacuum-cost-delays">
+ <title>Autovacuum Cost-based Delays</title>
<para>
When multiple workers are running, the autovacuum cost delay parameters
(see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
@@ -203,7 +200,10 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
are not considered in the balancing algorithm.
</para>
+ </sect2>
+ <sect2 id="autovacuum-lock-conflicts">
+ <title>Autovacuum and Lock Conflicts</title>
<para>
Autovacuum workers generally don't block other commands. If a process
attempts to acquire a lock that conflicts with the
@@ -223,6 +223,23 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
effectively prevent autovacuums from ever completing.
</para>
</warning>
+ </sect2>
+
+ <sect2 id="autovacuum-limitations">
+ <title>Limitations</title>
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+ </sect2>
</sect1>
<sect1 id="routine-vacuuming">
--
2.40.1
v4-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchapplication/octet-stream; name=v4-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patchDownload
From cd2b42a6e986fcf70269cb14050c5113f6b84ee0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 12 Apr 2023 14:42:06 -0700
Subject: [PATCH v4 1/9] Make autovacuum docs into a sect1 of its own.
This doesn't change any of the content itself. Though it does move it
from the end of "Routine Vacuuming" (which is itself a sect1) to a whole
new sect1 that appears _before_ "Routine Vacuuming".
This commit is as mechanical as possible, in the sense that it is
structured in such a way as to make git diff's "dimmed-zebra" mode show
as few true changes as possible (almost everything is strictly movement
of existing content). Note in particular that this commit does not fix
any indentation (we keep sect2 indentation for our new sect1). That
will happen in a later commit, after we're done splitting up content
from "The Autovacuum Daemon" into more subsections to improve its
readability.
XXX Open question: does it make more sense to move the sect1 to before
"Routine Vacuuming", or should it go after instead? There are arguments
for both.
Arguments for "before" (which is how it's done by this commit right
now):
"Before" gives greater prominence to the autovacuum scheduling tunables,
such as autovacuum_vacuum_scale_factor. These are the most important
individual tunables, which argues for putting them earlier than "Routine
Vacuuming".
Arguments for "after":
Although the discussion in "Routine Vacuuming" is rather involved, it is
arguably still introductory material that informs how the user will tune
autovacuum_vacuum_scale_factor. (Assuming that the user doesn't just go
by trial and error, which seems more likely in practice but not
necessarily the most useful working assumption for our purposes.)
---
doc/src/sgml/maintenance.sgml | 332 +++++++++++++++++-----------------
1 file changed, 166 insertions(+), 166 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 9cf9d030a..b9091e72c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -59,6 +59,172 @@
pleasant and productive experience with the system.
</para>
+ <sect1 id="autovacuum">
+ <title>The Autovacuum Daemon</title>
+
+ <indexterm>
+ <primary>autovacuum</primary>
+ <secondary>general information</secondary>
+ </indexterm>
+ <para>
+ <productname>PostgreSQL</productname> has an optional but highly
+ recommended feature called <firstterm>autovacuum</firstterm>,
+ whose purpose is to automate the execution of
+ <command>VACUUM</command> and <command>ANALYZE</command> commands.
+ When enabled, autovacuum checks for
+ tables that have had a large number of inserted, updated or deleted
+ tuples. These checks use the statistics collection facility;
+ therefore, autovacuum cannot be used unless <xref
+ linkend="guc-track-counts"/> is set to <literal>true</literal>.
+ In the default configuration, autovacuuming is enabled and the related
+ configuration parameters are appropriately set.
+ </para>
+
+ <para>
+ The <quote>autovacuum daemon</quote> actually consists of multiple processes.
+ There is a persistent daemon process, called the
+ <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+ <firstterm>autovacuum worker</firstterm> processes for all databases. The
+ launcher will distribute the work across time, attempting to start one
+ worker within each database every <xref linkend="guc-autovacuum-naptime"/>
+ seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
+ a new worker will be launched every
+ <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
+ A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
+ are allowed to run at the same time. If there are more than
+ <varname>autovacuum_max_workers</varname> databases to be processed,
+ the next database will be processed as soon as the first worker finishes.
+ Each worker process will check each table within its database and
+ execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
+ <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
+ autovacuum workers' activity.
+ </para>
+
+ <para>
+ If several large tables all become eligible for vacuuming in a short
+ amount of time, all autovacuum workers might become occupied with
+ vacuuming those tables for a long period. This would result
+ in other tables and databases not being vacuumed until a worker becomes
+ available. There is no limit on how many workers might be in a
+ single database, but workers do try to avoid repeating work that has
+ already been done by other workers. Note that the number of running
+ workers does not count towards <xref linkend="guc-max-connections"/> or
+ <xref linkend="guc-superuser-reserved-connections"/> limits.
+ </para>
+
+ <para>
+ Tables whose <structfield>relfrozenxid</structfield> value is more than
+ <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
+ vacuumed (this also applies to those tables whose freeze max age has
+ been modified via storage parameters; see below). Otherwise, if the
+ number of tuples obsoleted since the last
+ <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
+ table is vacuumed. The vacuum threshold is defined as:
+<programlisting>
+vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
+</programlisting>
+ where the vacuum base threshold is
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ the vacuum scale factor is
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
+ and the number of tuples is
+ <structname>pg_class</structname>.<structfield>reltuples</structfield>.
+ </para>
+
+ <para>
+ The table is also vacuumed if the number of tuples inserted since the last
+ vacuum has exceeded the defined insert threshold, which is defined as:
+<programlisting>
+vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
+</programlisting>
+ where the vacuum insert base threshold is
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
+ and vacuum insert scale factor is
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
+ Such vacuums may allow portions of the table to be marked as
+ <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
+ can reduce the work required in subsequent vacuums.
+ For tables which receive <command>INSERT</command> operations but no or
+ almost no <command>UPDATE</command>/<command>DELETE</command> operations,
+ it may be beneficial to lower the table's
+ <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
+ tuples to be frozen by earlier vacuums. The number of obsolete tuples and
+ the number of inserted tuples are obtained from the cumulative statistics system;
+ it is a semi-accurate count updated by each <command>UPDATE</command>,
+ <command>DELETE</command> and <command>INSERT</command> operation. (It is
+ only semi-accurate because some information might be lost under heavy
+ load.) If the <structfield>relfrozenxid</structfield> value of the table
+ is more than <varname>vacuum_freeze_table_age</varname> transactions old,
+ an aggressive vacuum is performed to freeze old tuples and advance
+ <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
+ since the last vacuum are scanned.
+ </para>
+
+ <para>
+ For analyze, a similar condition is used: the threshold, defined as:
+<programlisting>
+analyze threshold = analyze base threshold + analyze scale factor * number of tuples
+</programlisting>
+ is compared to the total number of tuples inserted, updated, or deleted
+ since the last <command>ANALYZE</command>.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and again whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Temporary tables cannot be accessed by autovacuum. Therefore,
+ appropriate vacuum and analyze operations should be performed via
+ session SQL commands.
+ </para>
+
+ <para>
+ The default thresholds and scale factors are taken from
+ <filename>postgresql.conf</filename>, but it is possible to override them
+ (and many other autovacuum control parameters) on a per-table basis; see
+ <xref linkend="sql-createtable-storage-parameters"/> for more information.
+ If a setting has been changed via a table's storage parameters, that value
+ is used when processing that table; otherwise the global settings are
+ used. See <xref linkend="runtime-config-autovacuum"/> for more details on
+ the global settings.
+ </para>
+
+ <para>
+ When multiple workers are running, the autovacuum cost delay parameters
+ (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
+ <quote>balanced</quote> among all the running workers, so that the
+ total I/O impact on the system is the same regardless of the number
+ of workers actually running. However, any workers processing tables whose
+ per-table <literal>autovacuum_vacuum_cost_delay</literal> or
+ <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
+ are not considered in the balancing algorithm.
+ </para>
+
+ <para>
+ Autovacuum workers generally don't block other commands. If a process
+ attempts to acquire a lock that conflicts with the
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+ acquisition will interrupt the autovacuum. For conflicting lock modes,
+ see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
+ is running to prevent transaction ID wraparound (i.e., the autovacuum query
+ name in the <structname>pg_stat_activity</structname> view ends with
+ <literal>(to prevent wraparound)</literal>), the autovacuum is not
+ automatically interrupted.
+ </para>
+
+ <warning>
+ <para>
+ Regularly running commands that acquire locks conflicting with a
+ <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+ effectively prevent autovacuums from ever completing.
+ </para>
+ </warning>
+ </sect1>
+
<sect1 id="routine-vacuuming">
<title>Routine Vacuuming</title>
@@ -749,172 +915,6 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
-
- <sect2 id="autovacuum">
- <title>The Autovacuum Daemon</title>
-
- <indexterm>
- <primary>autovacuum</primary>
- <secondary>general information</secondary>
- </indexterm>
- <para>
- <productname>PostgreSQL</productname> has an optional but highly
- recommended feature called <firstterm>autovacuum</firstterm>,
- whose purpose is to automate the execution of
- <command>VACUUM</command> and <command>ANALYZE</command> commands.
- When enabled, autovacuum checks for
- tables that have had a large number of inserted, updated or deleted
- tuples. These checks use the statistics collection facility;
- therefore, autovacuum cannot be used unless <xref
- linkend="guc-track-counts"/> is set to <literal>true</literal>.
- In the default configuration, autovacuuming is enabled and the related
- configuration parameters are appropriately set.
- </para>
-
- <para>
- The <quote>autovacuum daemon</quote> actually consists of multiple processes.
- There is a persistent daemon process, called the
- <firstterm>autovacuum launcher</firstterm>, which is in charge of starting
- <firstterm>autovacuum worker</firstterm> processes for all databases. The
- launcher will distribute the work across time, attempting to start one
- worker within each database every <xref linkend="guc-autovacuum-naptime"/>
- seconds. (Therefore, if the installation has <replaceable>N</replaceable> databases,
- a new worker will be launched every
- <varname>autovacuum_naptime</varname>/<replaceable>N</replaceable> seconds.)
- A maximum of <xref linkend="guc-autovacuum-max-workers"/> worker processes
- are allowed to run at the same time. If there are more than
- <varname>autovacuum_max_workers</varname> databases to be processed,
- the next database will be processed as soon as the first worker finishes.
- Each worker process will check each table within its database and
- execute <command>VACUUM</command> and/or <command>ANALYZE</command> as needed.
- <xref linkend="guc-log-autovacuum-min-duration"/> can be set to monitor
- autovacuum workers' activity.
- </para>
-
- <para>
- If several large tables all become eligible for vacuuming in a short
- amount of time, all autovacuum workers might become occupied with
- vacuuming those tables for a long period. This would result
- in other tables and databases not being vacuumed until a worker becomes
- available. There is no limit on how many workers might be in a
- single database, but workers do try to avoid repeating work that has
- already been done by other workers. Note that the number of running
- workers does not count towards <xref linkend="guc-max-connections"/> or
- <xref linkend="guc-superuser-reserved-connections"/> limits.
- </para>
-
- <para>
- Tables whose <structfield>relfrozenxid</structfield> value is more than
- <xref linkend="guc-autovacuum-freeze-max-age"/> transactions old are always
- vacuumed (this also applies to those tables whose freeze max age has
- been modified via storage parameters; see below). Otherwise, if the
- number of tuples obsoleted since the last
- <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
- table is vacuumed. The vacuum threshold is defined as:
-<programlisting>
-vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
-</programlisting>
- where the vacuum base threshold is
- <xref linkend="guc-autovacuum-vacuum-threshold"/>,
- the vacuum scale factor is
- <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
- and the number of tuples is
- <structname>pg_class</structname>.<structfield>reltuples</structfield>.
- </para>
-
- <para>
- The table is also vacuumed if the number of tuples inserted since the last
- vacuum has exceeded the defined insert threshold, which is defined as:
-<programlisting>
-vacuum insert threshold = vacuum base insert threshold + vacuum insert scale factor * number of tuples
-</programlisting>
- where the vacuum insert base threshold is
- <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>,
- and vacuum insert scale factor is
- <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
- Such vacuums may allow portions of the table to be marked as
- <firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
- it is a semi-accurate count updated by each <command>UPDATE</command>,
- <command>DELETE</command> and <command>INSERT</command> operation. (It is
- only semi-accurate because some information might be lost under heavy
- load.) If the <structfield>relfrozenxid</structfield> value of the table
- is more than <varname>vacuum_freeze_table_age</varname> transactions old,
- an aggressive vacuum is performed to freeze old tuples and advance
- <structfield>relfrozenxid</structfield>; otherwise, only pages that have been modified
- since the last vacuum are scanned.
- </para>
-
- <para>
- For analyze, a similar condition is used: the threshold, defined as:
-<programlisting>
-analyze threshold = analyze base threshold + analyze scale factor * number of tuples
-</programlisting>
- is compared to the total number of tuples inserted, updated, or deleted
- since the last <command>ANALYZE</command>.
- </para>
-
- <para>
- Partitioned tables are not processed by autovacuum. Statistics
- should be collected by running a manual <command>ANALYZE</command> when it is
- first populated, and again whenever the distribution of data in its
- partitions changes significantly.
- </para>
-
- <para>
- Temporary tables cannot be accessed by autovacuum. Therefore,
- appropriate vacuum and analyze operations should be performed via
- session SQL commands.
- </para>
-
- <para>
- The default thresholds and scale factors are taken from
- <filename>postgresql.conf</filename>, but it is possible to override them
- (and many other autovacuum control parameters) on a per-table basis; see
- <xref linkend="sql-createtable-storage-parameters"/> for more information.
- If a setting has been changed via a table's storage parameters, that value
- is used when processing that table; otherwise the global settings are
- used. See <xref linkend="runtime-config-autovacuum"/> for more details on
- the global settings.
- </para>
-
- <para>
- When multiple workers are running, the autovacuum cost delay parameters
- (see <xref linkend="runtime-config-resource-vacuum-cost"/>) are
- <quote>balanced</quote> among all the running workers, so that the
- total I/O impact on the system is the same regardless of the number
- of workers actually running. However, any workers processing tables whose
- per-table <literal>autovacuum_vacuum_cost_delay</literal> or
- <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
- are not considered in the balancing algorithm.
- </para>
-
- <para>
- Autovacuum workers generally don't block other commands. If a process
- attempts to acquire a lock that conflicts with the
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
- acquisition will interrupt the autovacuum. For conflicting lock modes,
- see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
- is running to prevent transaction ID wraparound (i.e., the autovacuum query
- name in the <structname>pg_stat_activity</structname> view ends with
- <literal>(to prevent wraparound)</literal>), the autovacuum is not
- automatically interrupted.
- </para>
-
- <warning>
- <para>
- Regularly running commands that acquire locks conflicting with a
- <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
- effectively prevent autovacuums from ever completing.
- </para>
- </warning>
- </sect2>
</sect1>
--
2.40.1
v4-0009-Overhaul-freezing-and-wraparound-docs.patchapplication/octet-stream; name=v4-0009-Overhaul-freezing-and-wraparound-docs.patchDownload
From 36666e87fe05ed24e413d1cd2f512bfacd65be38 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 13:04:13 -0700
Subject: [PATCH v4 9/9] Overhaul freezing and wraparound docs.
This is almost a complete rewrite. "Preventing Transaction ID
Wraparound Failures" becomes "Freezing to manage the transaction ID
space". This is follow-up work to commit 1de58df4, which added
page-level freezing to VACUUM.
The emphasis is now on the physical work of freezing pages. This flows
a little better than it otherwise would due to recent structural
cleanups to maintenance.sgml; discussion about freezing now immediately
follows discussion of cleanup of dead tuples. We still talk about the
problem of the system activating xidStopLimit protections in the same
section, but we use much less alarmist language about data corruption,
and are no longer overly concerned about the very worst case. We don't
rescind the recommendation that users recover from an xidStopLimit
outage by using single user mode, though that seems like something we
should aim to do in the near future.
There is no longer a separate sect3 to discuss MultiXactId related
issues. VACUUM now performs exactly the same processing steps when it
freezes a page, independent of the trigger condition.
Also move recommendation about setting autovacuum_freeze_min_age
reloption in append-only tables that was originally added by the
autovacuum_vacuum_insert_scale_factor commit over to "Routine
Vacuuming", where it now appears in the form of a "Tip" box.
Also describe the page-level freezing FPI optimization added by commit
1de58df4. This is expected to trigger the majority of all freezing with
many types of workloads.
Also move "table age" monitoring query to monitoring.sgml, though leave
behind a couple of forwarding links in maintenance.sgml's discussion of
freezing and relfrozenxid advancement.
---
doc/src/sgml/catalogs.sgml | 18 +-
doc/src/sgml/config.sgml | 79 +-
doc/src/sgml/logicaldecoding.sgml | 4 +-
doc/src/sgml/maintenance.sgml | 1062 ++++++++++++++++-----
doc/src/sgml/monitoring.sgml | 80 +-
doc/src/sgml/ref/create_table.sgml | 9 +-
doc/src/sgml/ref/prepare_transaction.sgml | 14 +-
doc/src/sgml/ref/vacuum.sgml | 14 +-
doc/src/sgml/ref/vacuumdb.sgml | 17 +-
doc/src/sgml/storage.sgml | 3 +-
doc/src/sgml/xact.sgml | 2 +-
11 files changed, 963 insertions(+), 339 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 524084055..7bb123e32 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2243,8 +2243,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
<para>
All transaction IDs before this one have been replaced with a permanent
(<quote>frozen</quote>) transaction ID in this table. This is used to track
- whether the table needs to be vacuumed in order to prevent transaction
- ID wraparound or to allow <literal>pg_xact</literal> to be shrunk. Zero
+ whether the table needs an aggressive <command>VACUUM</command> (see
+ <xref linkend="vacuum-aggressive"/>) or an anti-wraparound autovacuum
+ (see <xref linkend="vacuum-antiwraparound-autovacuums"/>). Zero
(<symbol>InvalidTransactionId</symbol>) if the relation is not a table.
</para></entry>
</row>
@@ -2256,8 +2257,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
<para>
All multixact IDs before this one have been replaced by a
transaction ID in this table. This is used to track
- whether the table needs to be vacuumed in order to prevent multixact ID
- wraparound or to allow <literal>pg_multixact</literal> to be shrunk. Zero
+ whether the table needs an aggressive <command>VACUUM</command> (see
+ <xref linkend="vacuum-aggressive"/>) or an anti-wraparound autovacuum
+ (see <xref linkend="vacuum-antiwraparound-autovacuums"/>). Zero
(<symbol>InvalidMultiXactId</symbol>) if the relation is not a table.
</para></entry>
</row>
@@ -3053,8 +3055,8 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
<para>
All transaction IDs before this one have been replaced with a permanent
(<quote>frozen</quote>) transaction ID in this database. This is used to
- track whether the database needs to be vacuumed in order to prevent
- transaction ID wraparound or to allow <literal>pg_xact</literal> to be shrunk.
+ track whether the database allows <literal>pg_xact</literal> to be
+ shrunk (see <xref linkend="vacuum-truncate-xact-status"/>).
It is the minimum of the per-table
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>relfrozenxid</structfield> values.
</para></entry>
@@ -3067,8 +3069,8 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
<para>
All multixact IDs before this one have been replaced with a
transaction ID in this database. This is used to
- track whether the database needs to be vacuumed in order to prevent
- multixact ID wraparound or to allow <literal>pg_multixact</literal> to be shrunk.
+ track whether the database allows <literal>pg_multixact</literal> to be
+ shrunk (see <xref linkend="vacuum-truncate-xact-status"/>).
It is the minimum of the per-table
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>relminmxid</structfield> values.
</para></entry>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 909a3f28c..0a94ff46a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2812,7 +2812,9 @@ include_dir 'conf.d'
<literal>1min</literal>) are only allowed because they may sometimes be
useful for testing. While a setting as high as <literal>60d</literal> is
allowed, please note that in many workloads extreme bloat or
- transaction ID wraparound may occur in much shorter time frames.
+ transaction ID exhaustion may occur in much shorter time frames
+ (see <xref linkend="vacuum-aggressive"/> and
+ <xref linkend="vacuum-xid-exhaustion"/>).
</para>
<para>
@@ -8358,9 +8360,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</para>
<para>
Note that even when this parameter is disabled, the system
- will launch autovacuum processes if necessary to
- prevent transaction ID wraparound. See <xref
- linkend="vacuum-for-wraparound"/> for more information.
+ will launch anti-wraparound autovacuums. See <xref
+ linkend="vacuum-antiwraparound-autovacuums"/> for more information.
</para>
</listitem>
</varlistentry>
@@ -8536,20 +8537,17 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<para>
Specifies the maximum age (in transactions) that a table's
<structname>pg_class</structname>.<structfield>relfrozenxid</structfield> field can
- attain before a <command>VACUUM</command> operation is forced
- to prevent transaction ID wraparound within the table.
- Note that the system will launch autovacuum processes to
- prevent wraparound even when autovacuum is otherwise disabled.
+ attain before an anti-wraparound autovacuum is forced for the table.
+ Note that the system will launch anti-wraparound autovacuum
+ processes even when autovacuum is otherwise disabled.
</para>
<para>
- Vacuum also allows removal of old files from the
- <filename>pg_xact</filename> subdirectory, which is why the default
- is a relatively low 200 million transactions.
+ The default is 200 million transactions.
This parameter can only be set at server start, but the setting
can be reduced for individual tables by
changing table storage parameters.
- For more information see <xref linkend="vacuum-for-wraparound"/>.
+ For more information see <xref linkend="vacuum-antiwraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -8565,20 +8563,16 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<para>
Specifies the maximum age (in multixacts) that a table's
<structname>pg_class</structname>.<structfield>relminmxid</structfield> field can
- attain before a <command>VACUUM</command> operation is forced to
- prevent multixact ID wraparound within the table.
- Note that the system will launch autovacuum processes to
- prevent wraparound even when autovacuum is otherwise disabled.
+ attain before an anti-wraparound autovacuum is forced for the table.
+ Note that the system will launch anti-wraparound autovacuum
+ processes even when autovacuum is otherwise disabled.
</para>
<para>
- Vacuuming multixacts also allows removal of old files from the
- <filename>pg_multixact/members</filename> and <filename>pg_multixact/offsets</filename>
- subdirectories, which is why the default is a relatively low
- 400 million multixacts.
+ The default is 400 million Multixact IDs.
This parameter can only be set at server start, but the setting can
be reduced for individual tables by changing table storage parameters.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="vacuum-antiwraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -9282,10 +9276,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
set this value anywhere from zero to two billion, <command>VACUUM</command>
will silently limit the effective value to 95% of
<xref linkend="guc-autovacuum-freeze-max-age"/>, so that a
- periodic manual <command>VACUUM</command> has a chance to run before an
- anti-wraparound autovacuum is launched for the table. For more
- information see
- <xref linkend="vacuum-for-wraparound"/>.
+ standard autovacuum (or a manual <command>VACUUM</command>) has a
+ chance to run using <command>VACUUM</command>'s aggressive strategy
+ before an anti-wraparound autovacuum is launched for the table. For
+ more information see <xref linkend="vacuum-aggressive"/> and <xref
+ linkend="vacuum-antiwraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -9307,7 +9302,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-freeze-max-age"/>, so
that there is not an unreasonably short time between forced
autovacuums. For more information see <xref
- linkend="vacuum-for-wraparound"/>.
+ linkend="vacuum-freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9324,9 +9319,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
field can attain before <command>VACUUM</command> takes
extraordinary measures to avoid system-wide transaction ID
- wraparound failure. This is <command>VACUUM</command>'s
- strategy of last resort. The failsafe typically triggers
- when an autovacuum to prevent transaction ID wraparound has
+ exhaustion (see <xref linkend="vacuum-xid-exhaustion"/>).
+ This is <command>VACUUM</command>'s strategy of last resort. The
+ failsafe typically triggers when an
+ <link linkend="vacuum-antiwraparound-autovacuums">anti-wraparound
+ autovacuum</link> has
already been running for some time, though it's possible for
the failsafe to trigger during any <command>VACUUM</command>.
</para>
@@ -9344,7 +9341,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
set this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-freeze-max-age"/>.
+ linkend="guc-autovacuum-freeze-max-age"/>. For more
+ information see <xref linkend="vacuum-xid-exhaustion"/>.
</para>
</listitem>
</varlistentry>
@@ -9366,9 +9364,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
Although users can set this value anywhere from zero to two billion,
<command>VACUUM</command> will silently limit the effective value to 95% of
<xref linkend="guc-autovacuum-multixact-freeze-max-age"/>, so that a
- periodic manual <command>VACUUM</command> has a chance to run before an
- anti-wraparound is launched for the table.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ standard autovacuum (or a manual <command>VACUUM</command>) has a
+ chance to run using <command>VACUUM</command>'s aggressive strategy
+ before an anti-wraparound autovacuum is launched for the table. For
+ more information see <xref linkend="vacuum-aggressive"/> and <xref
+ linkend="vacuum-antiwraparound-autovacuums"/>.
</para>
</listitem>
</varlistentry>
@@ -9389,7 +9389,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
the value of <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>,
so that there is not an unreasonably short time between forced
autovacuums.
- For more information see <xref linkend="vacuum-for-multixact-wraparound"/>.
+ For more information see <xref linkend="vacuum-freezing-xid-space"/>.
</para>
</listitem>
</varlistentry>
@@ -9406,9 +9406,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<structname>pg_class</structname>.<structfield>relminmxid</structfield>
field can attain before <command>VACUUM</command> takes
extraordinary measures to avoid system-wide multixact ID
- wraparound failure. This is <command>VACUUM</command>'s
- strategy of last resort. The failsafe typically triggers when
- an autovacuum to prevent transaction ID wraparound has already
+ exhaustion (see <xref linkend="vacuum-xid-exhaustion"/>).
+ This is <command>VACUUM</command>'s strategy of last resort. The
+ failsafe typically triggers when an
+ <link linkend="vacuum-antiwraparound-autovacuums">anti-wraparound
+ autovacuum</link> has already
been running for some time, though it's possible for the
failsafe to trigger during any <command>VACUUM</command>.
</para>
@@ -9422,7 +9424,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
this value anywhere from zero to 2.1 billion,
<command>VACUUM</command> will silently adjust the effective
value to no less than 105% of <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. For more
+ information see <xref linkend="vacuum-xid-exhaustion"/>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cbd3aa804..cc6499e36 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -352,8 +352,8 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
even when there is no connection using them. This consumes storage
because neither required WAL nor required rows from the system catalogs
can be removed by <command>VACUUM</command> as long as they are required by a replication
- slot. In extreme cases this could cause the database to shut down to prevent
- transaction ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ slot. In extreme cases this could cause the database to refuse to allocate new
+ transaction IDs (see <xref linkend="vacuum-xid-exhaustion"/>).
So if a slot is no longer required it should be dropped.
</para>
</caution>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index f00442564..abdf01009 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -148,13 +148,8 @@ vacuum insert threshold = vacuum base insert threshold + vacuum insert scale fac
<xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>.
Such vacuums may allow portions of the table to be marked as
<firstterm>all visible</firstterm> and also allow tuples to be frozen, which
- can reduce the work required in subsequent vacuums.
- For tables which receive <command>INSERT</command> operations but no or
- almost no <command>UPDATE</command>/<command>DELETE</command> operations,
- it may be beneficial to lower the table's
- <xref linkend="reloption-autovacuum-freeze-min-age"/> as this may allow
- tuples to be frozen by earlier vacuums. The number of obsolete tuples and
- the number of inserted tuples are obtained from the cumulative statistics system;
+ can reduce the work required in subsequent vacuums. The number of obsolete tuples
+ and the number of inserted tuples are obtained from the cumulative statistics system;
it is a semi-accurate count updated by each <command>UPDATE</command>,
<command>DELETE</command> and <command>INSERT</command> operation. (It is
only semi-accurate because some information might be lost under heavy
@@ -211,10 +206,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
acquisition will interrupt the autovacuum. For conflicting lock modes,
see <xref linkend="table-lock-compatibility"/>. However, if the autovacuum
- is running to prevent transaction ID wraparound (i.e., the autovacuum query
- name in the <structname>pg_stat_activity</structname> view ends with
+ is an anti-wraparound autovacuum (i.e., the autovacuum query name in the
+ <structname>pg_stat_activity</structname> view ends with
<literal>(to prevent wraparound)</literal>), the autovacuum is not
- automatically interrupted.
+ automatically interrupted. See <xref linkend="vacuum-antiwraparound-autovacuums"/>
+ for more details on anti-wraparound autovacuums.
</para>
<warning>
@@ -272,15 +268,21 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To maintain the system's ability to allocate transaction IDs
+ through freezing.</simpara>
</listitem>
<listitem>
<simpara>To update the visibility map, which speeds
up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
+ scans</link>, and helps the next <command>VACUUM</command>
+ operation avoid needlessly scanning already-frozen pages.</simpara>
+ </listitem>
+
+ <listitem>
+ <simpara>To enable truncation of obsolescent transaction status
+ information in structures such as <filename>pg_xact</filename> for the
+ entire cluster.</simpara>
</listitem>
<listitem>
@@ -477,303 +479,756 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</warning>
</sect2>
- <sect2 id="vacuum-for-wraparound">
- <title>Preventing Transaction ID Wraparound Failures</title>
-
- <indexterm zone="vacuum-for-wraparound">
- <primary>transaction ID</primary>
- <secondary>wraparound</secondary>
- </indexterm>
+ <sect2 id="vacuum-freezing-xid-space">
+ <title>Freezing to manage the transaction ID space</title>
<indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
+ <primary>Freezing</primary>
+ <secondary>of transaction IDs and Multixact IDs</secondary>
</indexterm>
<para>
- <productname>PostgreSQL</productname>'s <link
- linkend="mvcc-intro">MVCC</link> transaction semantics depend on
- being able to compare <glossterm linkend="glossary-xid">transaction
- ID numbers (<acronym>XID</acronym>)</glossterm> to determine
- whether or not the row is visible to each query's MVCC snapshot
- (see <xref linkend="interpreting-xid-stamps"/>). But since
- on-disk storage of transaction IDs in heap pages uses a truncated
- 32-bit representation to save space (rather than the full 64-bit
- representation), it is necessary to vacuum every table in every
- database <emphasis>at least</emphasis> once every two billion
- transactions (though far more frequent vacuuming is typical).
+ <command>VACUUM</command> often marks some of the pages that it scans
+ <emphasis>frozen</emphasis>, indicating that all eligible rows on the page
+ were inserted by a transaction that committed sufficiently far in the past
+ that the effects of the inserting transaction are certain to be visible to
+ all current and future transactions. The specific transaction ID number
+ (<acronym>XID</acronym>) stored in a frozen heap row's
+ <structfield>xmin</structfield> field is no longer needed to determine its
+ visibility. Furthermore, when a row undergoing freezing has an XID set in
+ its <structfield>xmax</structfield> field (e.g., an XID left behind by an
+ earlier <command>SELECT FOR UPDATE</command> row locker), the
+ <structfield>xmax</structfield> field's XID is usually also removed.
</para>
<para>
- <xref linkend="guc-vacuum-freeze-min-age"/>
- controls how old an XID value has to be before rows bearing that XID will be
- frozen. Increasing this setting may avoid unnecessary work if the
- rows that would otherwise be frozen will soon be modified again,
- but decreasing this setting increases
- the number of transactions that can elapse before the table must be
- vacuumed again.
+ Once frozen, heap pages are <quote>self-contained</quote>. Every query
+ can read all of the page's rows in a way that assumes that the inserting
+ transaction committed and is visible to its <acronym>MVCC</acronym>
+ snapshot. No query will ever have to consult external transaction status
+ metadata to interpret the page's contents, either. In particular,
+ <filename>pg_xact</filename> transaction XID commit/abort status lookups
+ won't occur during query execution.
</para>
<para>
- <command>VACUUM</command> uses the <link linkend="storage-vm">visibility map</link>
- to determine which pages of a table must be scanned. Normally, it
- will skip pages that don't have any dead row versions even if those pages
- might still have row versions with old XID values. Therefore, normal
- <command>VACUUM</command>s won't always freeze every old row version in the table.
- When that happens, <command>VACUUM</command> will eventually need to perform an
- <firstterm>aggressive vacuum</firstterm>, which will freeze all eligible unfrozen
- XID and MXID values, including those from all-visible but not all-frozen pages.
- In practice most tables require periodic aggressive vacuuming.
- <xref linkend="guc-vacuum-freeze-table-age"/>
- controls when <command>VACUUM</command> does that: all-visible but not all-frozen
- pages are scanned if the number of transactions that have passed since the
- last such scan is greater than <varname>vacuum_freeze_table_age</varname> minus
- <varname>vacuum_freeze_min_age</varname>. Setting
- <varname>vacuum_freeze_table_age</varname> to 0 forces <command>VACUUM</command> to
- always use its aggressive strategy.
+ Freezing is a <acronym>WAL</acronym>-logged operation, so when
+ <command>VACUUM</command> freezes a heap page, any copy of the page
+ located on a physical replication standby server will itself be
+ <quote>frozen</quote> shortly thereafter (when the relevant
+ <literal>FREEZE_PAGE</literal> <acronym>WAL</acronym> record is replayed
+ on the standby). Queries that run on physical replication standbys avoid
+ <filename>pg_xact</filename> lookups when reading from frozen pages, just
+ like queries that run on the primary server
+ <footnote>
+ <para>
+ In this regard, freezing is unlike setting transaction status
+ <quote>hint bits</quote> in tuple headers: setting hint bits doesn't
+ usually need to be <acronym>WAL</acronym>-logged, and can take place on
+ physical replication standby servers without input from the primary
+ server. Hint bits exist to allow query execution to avoid repeated
+ <filename>pg_xact</filename> lookups for the same tuples, strictly as an
+ optimization. On the other hand, freezing exists because the system
+ needs to reliably remove <filename>pg_xact</filename> dependencies from
+ individual tuples.
+ </para>
+ </footnote>.
</para>
<para>
- The maximum time that a table can go unvacuumed is two billion
- transactions minus the <varname>vacuum_freeze_min_age</varname> value at
- the time of the last aggressive vacuum. If it were to go
- unvacuumed for longer than
- that, data loss could result. To ensure that this does not happen,
- autovacuum is invoked on any table that might contain unfrozen rows with
- XIDs older than the age specified by the configuration parameter <xref
- linkend="guc-autovacuum-freeze-max-age"/>. (This will happen even if
- autovacuum is disabled.)
+ <command>VACUUM</command> generally postpones some freezing work as an
+ optimization, but <command>VACUUM</command> cannot delay freezing forever.
+ Since on-disk storage of transaction IDs in heap row headers uses a
+ truncated partial 32-bit representation to save space (rather than the
+ full 64-bit representation used in other contexts), it plays a crucial
+ role in enabling <link linkend="vacuum-aggressive">management of the XID
+ address space</link> by <command>VACUUM</command>. If, for whatever
+ reason, <command>VACUUM</command> is unable to freeze older XIDs on behalf
+ of an application that continues to require XID allocations, the system
+ will eventually <link linkend="vacuum-xid-exhaustion">refuse to allocate
+ transaction IDs</link> due to transaction ID exhaustion (though this is
+ unlikely to occur unless autovacuum is configured incorrectly).
</para>
<para>
- This implies that if a table is not otherwise vacuumed,
- autovacuum will be invoked on it approximately once every
- <varname>autovacuum_freeze_max_age</varname> minus
- <varname>vacuum_freeze_min_age</varname> transactions.
- For tables that are regularly vacuumed for space reclamation purposes,
- this is of little importance. However, for static tables
- (including tables that receive inserts, but no updates or deletes),
- there is no need to vacuum for space reclamation, so it can
- be useful to try to maximize the interval between forced autovacuums
- on very large static tables. Obviously one can do this either by
- increasing <varname>autovacuum_freeze_max_age</varname> or decreasing
- <varname>vacuum_freeze_min_age</varname>.
+ <xref linkend="guc-vacuum-freeze-min-age"/> controls when freezing takes
+ place. When <command>VACUUM</command> scans a heap page containing even
+ one XID that has already attained an age exceeding this value, the page is
+ frozen.
+ </para>
+
+ <indexterm>
+ <primary>Multixact ID</primary>
+ <secondary>Freezing of</secondary>
+ </indexterm>
+
+ <para>
+ <firstterm>Multixact IDs</firstterm> support row locking by multiple
+ transactions. Since there is only limited space in a <link
+ linkend="storage-tuple-layout">heap tuple header</link> to store lock
+ information, that information is encoded as a <quote>multiple transaction
+ ID</quote>, or Multixact ID for short, whenever there is more than one
+ transaction concurrently locking a row. Information about which
+ transaction IDs are included in any particular Multixact ID is stored
+ separately in <filename>pg_multixact</filename>. Only the Multixact ID
+ itself (a 32-bit integer) appears in the tuple's
+ <structfield>xmax</structfield> field. This creates a dependency on
+ external Multixact ID transaction status information. This is similar to
+ the dependency ordinary unfrozen XIDs have on commit status information
+ from <filename>pg_xact</filename>. <command>VACUUM</command> must
+ therefore occasionally remove Multixact IDs from tuples during freezing.
</para>
<para>
- The effective maximum for <varname>vacuum_freeze_table_age</varname> is 0.95 *
- <varname>autovacuum_freeze_max_age</varname>; a setting higher than that will be
- capped to the maximum. A value higher than
- <varname>autovacuum_freeze_max_age</varname> wouldn't make sense because an
- anti-wraparound autovacuum would be triggered at that point anyway, and
- the 0.95 multiplier leaves some breathing room to run a manual
- <command>VACUUM</command> before that happens. As a rule of thumb,
- <command>vacuum_freeze_table_age</command> should be set to a value somewhat
- below <varname>autovacuum_freeze_max_age</varname>, leaving enough gap so that
- a regularly scheduled <command>VACUUM</command> or an autovacuum triggered by
- normal delete and update activity is run in that window. Setting it too
- close could lead to anti-wraparound autovacuums, even though the table
- was recently vacuumed to reclaim space, whereas lower values lead to more
- frequent aggressive vacuuming.
+ <xref linkend="guc-vacuum-multixact-freeze-min-age"/> also controls when
+ freezing takes place. It is analogous to
+ <varname>vacuum_freeze_min_age</varname>, but <quote>age</quote> is
+ expressed in Multixact ID units. Lowering
+ <varname>vacuum_multixact_freeze_min_age</varname>
+ <emphasis>forces</emphasis> <command>VACUUM</command> to process
+ <structfield>xmax</structfield> fields with a Multixact ID in cases where
+ it would otherwise postpone the work of processing
+ <structfield>xmax</structfield> until the next <command>VACUUM</command>
+ <footnote>
+ <para>
+ <quote>Freezing</quote> of <structfield>xmax</structfield> fields
+ (whether they contain an XID or a Multixact ID) generally means clearing
+ <structfield>xmax</structfield> from a tuple header.
+ <command>VACUUM</command> may occasionally encounter an individual
+ Multixact ID that must be removed to advance the table's
+ <structfield>relminmxid</structfield> by the required amount, which can
+ only be processed by generating a replacement Multixact ID (containing
+ just the non-removable subset of member XIDs from the original Multixact
+ ID), and then setting <structfield>xmax</structfield> to the
+ new/replacement Multixact ID value.
+ </para>
+ </footnote>. The setting generally doesn't significantly influence the
+ total number of pages <command>VACUUM</command> freezes, even in tables
+ containing many Multixact IDs. This is because <command>VACUUM</command>
+ generally prefers proactive processing for most individual
+ <structfield>xmax</structfield> fields that contain a Multixact ID (eager
+ proactive processing is typically cheaper).
</para>
<para>
- The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</varname>
- (and <varname>vacuum_freeze_table_age</varname> along with it) is that
- the <filename>pg_xact</filename> and <filename>pg_commit_ts</filename>
- subdirectories of the database cluster will take more space, because it
- must store the commit status and (if <varname>track_commit_timestamp</varname> is
- enabled) timestamp of all transactions back to
- the <varname>autovacuum_freeze_max_age</varname> horizon. The commit status uses
- two bits per transaction, so if
- <varname>autovacuum_freeze_max_age</varname> is set to its maximum allowed value
- of two billion, <filename>pg_xact</filename> can be expected to grow to about half
- a gigabyte and <filename>pg_commit_ts</filename> to about 20GB. If this
- is trivial compared to your total database size,
- setting <varname>autovacuum_freeze_max_age</varname> to its maximum allowed value
- is recommended. Otherwise, set it depending on what you are willing to
- allow for <filename>pg_xact</filename> and <filename>pg_commit_ts</filename> storage.
- (The default, 200 million transactions, translates to about 50MB
- of <filename>pg_xact</filename> storage and about 2GB of <filename>pg_commit_ts</filename>
- storage.)
+ Managing the added <acronym>WAL</acronym> volume from freezing over time
+ is a vital consideration for <command>VACUUM</command>. It is why
+ <command>VACUUM</command> doesn't just freeze every eligible tuple at the
+ earliest opportunity: the <acronym>WAL</acronym> written to freeze a
+ page's tuples is wasted in cases where the resulting frozen tuples are
+ soon deleted or updated anyway. It's also why <command>VACUUM</command>
+ <emphasis>will</emphasis> freeze all eligible tuples from a heap page once
+ the decision to freeze at least one tuple is taken: at that point, the
+ added cost of freezing all eligible tuples eagerly (measured in
+ <quote>extra bytes of <acronym>WAL</acronym> written</quote>) is far lower
+ than the probable cost of deferring freezing until a future
+ <command>VACUUM</command> operation against the same table. Furthermore,
+ once the page is frozen, it can generally be <link
+ linkend="vacuum-for-visibility-map">marked as all-frozen within the
+ visibility map</link> immediately afterwards.
</para>
- <para>
- One disadvantage of decreasing <varname>vacuum_freeze_min_age</varname> is that
- it might cause <command>VACUUM</command> to do useless work: freezing a row
- version is a waste of time if the row is modified
- soon thereafter (causing it to acquire a new XID). So the setting should
- be large enough that rows are not frozen until they are unlikely to change
- any more.
- </para>
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 16,
+ <command>VACUUM</command> triggered freezing at the level of individual
+ <structfield>xmin</structfield> and <structfield>xmax</structfield>
+ fields. Freezing only affected the exact XIDs that had already attained
+ an age of <varname>vacuum_freeze_min_age</varname> or greater.
+ </para>
+ </note>
<para>
- To track the age of the oldest unfrozen XIDs in a database,
- <command>VACUUM</command> stores XID
- statistics in the system tables <structname>pg_class</structname> and
- <structname>pg_database</structname>. In particular,
- the <structfield>relfrozenxid</structfield> column of a table's
- <structname>pg_class</structname> row contains the oldest remaining unfrozen
- XID at the end of the most recent <command>VACUUM</command> that successfully
- advanced <structfield>relfrozenxid</structfield> (typically the most recent
- aggressive VACUUM). Similarly, the
- <structfield>datfrozenxid</structfield> column of a database's
- <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
- appearing in that database — it is just the minimum of the
- per-table <structfield>relfrozenxid</structfield> values within the database.
- A convenient way to
- examine this information is to execute queries such as:
-
-<programlisting>
-SELECT c.oid::regclass as table_name,
- greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
-FROM pg_class c
-LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
-WHERE c.relkind IN ('r', 'm');
-
-SELECT datname, age(datfrozenxid) FROM pg_database;
-</programlisting>
-
- The <literal>age</literal> column measures the number of transactions from the
- cutoff XID to the current transaction's XID.
+ <command>VACUUM</command> also triggers the freezing of a page in cases
+ where it already proved necessary to write out a full page image
+ (<acronym>FPI</acronym>) as part of a <acronym>WAL</acronym> record
+ describing how dead tuples were removed <footnote>
+ <para>
+ Actually, the <quote>freeze on an <acronym>FPI</acronym> write</quote>
+ mechanism isn't just used when <command>VACUUM</command> needs to
+ generate an <acronym>FPI</acronym> (as torn page protection) for
+ inclusion in a <acronym>WAL</acronym> record describing how dead tuples
+ were removed. The <acronym>FPI</acronym> mechanism also triggers when
+ hint bits are set by <command>VACUUM</command>, if and only if it
+ necessitates writing an <acronym>FPI</acronym>. The need to write a
+ <acronym>WAL</acronym> record to set hint bits only arises when
+ <xref linkend="guc-wal-log-hints"/> is enabled in
+ <filename>postgresql.conf</filename>, or when data checksums were
+ enabled when the cluster was initialized with <xref linkend="app-initdb"/>.
+ </para>
+ </footnote> (see <xref linkend="wal-reliability"/> for background
+ information about how <acronym>FPI</acronym>s provide torn page
+ protection). This <quote>freeze on an <acronym>FPI</acronym>
+ write</quote> batching mechanism avoids an expected additional
+ <acronym>FPI</acronym> for the same page later on (this is the probable
+ outcome of lazily deferring freezing until <varname>vacuum_freeze_min_age</varname>
+ forces it). In effect, <command>VACUUM</command> generates slightly more
+ <acronym>WAL</acronym> in the short term with the aim of ultimately
+ needing to generate much less <acronym>WAL</acronym> in the long term.
</para>
<tip>
<para>
- When the <command>VACUUM</command> command's <literal>VERBOSE</literal>
- parameter is specified, <command>VACUUM</command> prints various
- statistics about the table. This includes information about how
- <structfield>relfrozenxid</structfield> and
- <structfield>relminmxid</structfield> advanced, and the number of
- newly frozen pages. The same details appear in the server log when
- autovacuum logging (controlled by <xref
- linkend="guc-log-autovacuum-min-duration"/>) reports on a
- <command>VACUUM</command> operation executed by autovacuum.
+ For tables that receive <command>INSERT</command> operations, but few or
+ no <command>UPDATE</command>/<command>DELETE</command> operations, it
+ might be beneficial to lower <xref linkend="reloption-autovacuum-freeze-min-age"/>
+ for the table. This makes <command>VACUUM</command> freeze the table's
+ pages <quote>eagerly</quote> during earlier autovacuums triggered by
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>, which
+ improves performance stability for some workloads.
</para>
</tip>
- <para>
- <command>VACUUM</command> normally only scans pages that have been modified
- since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
- advanced when every page of the table
- that might contain unfrozen XIDs is scanned. This happens when
- <structfield>relfrozenxid</structfield> is more than
- <varname>vacuum_freeze_table_age</varname> transactions old, when
- <command>VACUUM</command>'s <literal>FREEZE</literal> option is used, or when all
- pages that are not already all-frozen happen to
- require vacuuming to remove dead row versions. When <command>VACUUM</command>
- scans every page in the table that is not already all-frozen, it should
- set <literal>age(relfrozenxid)</literal> to a value just a little more than the
- <varname>vacuum_freeze_min_age</varname> setting
- that was used (more by the number of transactions started since the
- <command>VACUUM</command> started). <command>VACUUM</command>
- will set <structfield>relfrozenxid</structfield> to the oldest XID
- that remains in the table, so it's possible that the final value
- will be much more recent than strictly required.
- If no <structfield>relfrozenxid</structfield>-advancing
- <command>VACUUM</command> is issued on the table until
- <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
- be forced for the table.
- </para>
+ <sect3 id="vacuum-aggressive">
+ <title>Aggressive <command>VACUUM</command></title>
- <para>
- If for some reason autovacuum fails to clear old XIDs from a table, the
- system will begin to emit warning messages like this when the database's
- oldest XIDs reach forty million transactions from the wraparound point:
+ <indexterm zone="vacuum-aggressive">
+ <primary>transaction ID</primary>
+ <secondary>wraparound</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs and Multixact IDs</secondary>
+ </indexterm>
+
+ <para>
+ As noted already, freezing doesn't just allow queries to avoid lookups of
+ subsidiary transaction status information in structures such as
+ <filename>pg_xact</filename>. Freezing also plays a crucial role in
+ enabling transaction ID address space management by
+ <command>VACUUM</command> (and autovacuum). <command>VACUUM</command>
+ maintains information about the oldest unfrozen XID that remains in the
+ table when it uses its <firstterm>aggressive strategy</firstterm>.
+ </para>
+
+ <para>
+ Aggressive <command>VACUUM</command> updates the table's <link
+ linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>relfrozenxid</structfield>
+ to whatever XID was the oldest observed XID that
+ <command>VACUUM</command> <emphasis>didn't</emphasis> freeze still
+ remaining at the end of processing. The table's
+ <structfield>relfrozenxid</structfield> <quote>advances</quote> by a
+ certain number of XIDs (relative to the previous value set during the last
+ aggressive <command>VACUUM</command>) as progress on freezing the oldest
+ pages in the table permits. Aggressive <command>VACUUM</command> will
+ occasionally need to advance the whole database's <link
+ linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>datfrozenxid</structfield>
+ afterwards, too — this is the minimum of the per-table
+ <structfield>relfrozenxid</structfield> values (i.e., the earliest
+ <structfield>relfrozenxid</structfield>) within the database.
+ </para>
+
+ <para>
+ Aggressive <command>VACUUM</command> may need to perform significant
+ amounts of <quote>catch-up</quote> freezing missed by earlier
+ non-aggressive <command>VACUUM</command>s, because non-aggressive
+ <command>VACUUM</command> sometimes allows unfrozen pages to build up.
+ </para>
+
+ <para>
+ Over time, aggressive autovacuuming has two beneficial effects on the
+ system as a whole:
+
+ <orderedlist>
+ <listitem>
+ <simpara>It keeps track of the oldest remaining unfrozen transaction ID
+ in the entire <glossterm linkend="glossary-db-cluster">database
+ cluster</glossterm> (i.e., the oldest transaction ID across every
+ table in every database).</simpara>
+ </listitem>
+ <listitem>
+ <simpara>It avoids a cluster-wide oldest unfrozen transaction ID that
+ is <quote>too old</quote>.</simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <para>
+ The maximum XID age that the system can tolerate (i.e., the maximum
+ <quote>distance</quote> between the oldest unfrozen transaction ID in any
+ table in the database cluster and the next unallocated transaction ID) is
+ about 2.1 billion transaction IDs. This <quote>maximum XID age</quote>
+ invariant makes it fundamentally impossible to postpone aggressive
+ <command>VACUUM</command>s (and freezing) forever. While there is no
+ simple formula for determining an oldest XID <quote>age</quote> for
+ database administrators to target, the invariant imposes a 2.1 billion
+ XID age hard limit — so there <emphasis>is</emphasis> a clear point
+ at which unfrozen XIDs should <emphasis>always</emphasis> be considered
+ <quote>too old</quote>, regardless of individual application requirements
+ or workload characteristics. If the hard limit is reached, the system
+ experiences <link linkend="vacuum-xid-exhaustion">transaction ID
+ exhaustion</link>, which temporarily prevents the allocation of new
+ permanent transaction IDs. The system will only regain the ability to
+ allocate new transaction IDs when <command>VACUUM</command> succeeds in
+ advancing the oldest <literal>datfrozenxid</literal> in the cluster
+ (following an aggressive <command>VACUUM</command> that runs to
+ completion against the table with the oldest
+ <structfield>relfrozenxid</structfield>).
+ </para>
+
+ <para>
+ Aggressive <command>VACUUM</command> also maintains the <link
+ linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>relminmxid</structfield> and
+ <link linkend="catalog-pg-database"><structname>pg_database</structname></link>.<structfield>datminmxid</structfield>
+ fields. These are needed to track the oldest Multixact ID in the table
+ and database, respectively. There are analogous rules, driven by
+ analogous considerations about managing the Multixact ID space. This
+ doesn't usually affect aggressive vacuuming requirements to a noticeable
+ degree, but can in databases that consume more Multixact IDs than
+ transaction IDs.
+ </para>
+
+ <caution>
+ <para>
+ <command>VACUUM</command> may not always freeze tuple
+ <structfield>xmin</structfield> XIDs that have reached
+ <varname>vacuum_freeze_min_age</varname> in age. The basic eligibility
+ criteria for freezing is the same as the criteria that determines if a
+ deleted tuple is safe for <command>VACUUM</command> to remove: the
+ XID-based <literal>removable cutoff</literal> (this is one of the
+ details that appears in the server log's reports on autovacuum
+ <footnote id="vacuum-autovacuum-log">
+ <para>
+ Autovacuum's log reports appear in the server log for autovacuums
+ whose <command>VACUUM</command> takes longer than a threshold
+ controlled by <xref linkend="guc-log-autovacuum-min-duration"/>.
+ Manual <command>VACUUM</command>s output the same details as
+ <literal>INFO</literal> messages when the <command>VACUUM</command>
+ command's <literal>VERBOSE</literal> option is used (note that manual
+ <command>VACUUM</command>s never generate reports in the server log).
+ </para>
+ </footnote>).
+ </para>
+ <para>
+ In extreme cases, a long-running transaction can hold back every
+ <command>VACUUM</command>'s <literal>removable cutoff</literal> for so
+ long that the system experiences
+ <link linkend="vacuum-xid-exhaustion">transaction ID exhaustion</link>.
+ See <xref linkend="monitoring-table-age"/> for details on how to monitor
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> age to avoid
+ transaction ID/Multixact ID exhaustion.
+ </para>
+ <para>
+ These issues can be debugged by following autovacuum log reports from
+ the server log over time: the log reports will include information about
+ the age of each <command>VACUUM</command>'s <literal>removable
+ cutoff</literal> at the point the <command>VACUUM</command> ended.
+ It may be useful to correlate the use of a cutoff with an excessively
+ high age with application-level problems such as long-running
+ transactions.
+ </para>
+ </caution>
+
+ <para>
+ The 2.1 billion XIDs <quote>maximum XID age</quote> invariant must be
+ preserved because transaction IDs stored in <link
+ linkend="storage-tuple-layout">heap tuple headers</link> use a truncated
+ 32-bit representation (rather than the full 64-bit representation used in
+ other contexts). Since all unfrozen transaction IDs from heap tuple
+ headers <emphasis>must</emphasis> be from the same transaction ID epoch
+ (or from a space in the 64-bit representation that spans two adjoining
+ transaction ID epochs), there isn't any need to include a separate epoch
+ field in each tuple header (see <xref linkend="interpreting-xid-stamps"/>
+ for further details). This scheme requires much less on-disk storage
+ space than a design that stores full 64-bit XIDs (consisting of a 32-bit
+ epoch and a 32-bit partial XID) in heap tuple headers. On the other
+ hand, it constrains the system's ability to allocate new XIDs in the
+ worst case scenario where transaction ID exhaustion occurs.
+ </para>
+
+ <para>
+ There is only one <emphasis>major</emphasis> behavioral difference
+ between aggressive <command>VACUUM</command> and non-aggressive
+ <command>VACUUM</command>: non-aggressive <command>VACUUM</command> skips
+ pages marked as all-visible using the visibility map, whereas aggressive
+ <command>VACUUM</command> only skips the subset of pages that are both
+ all-visible <emphasis>and</emphasis> all-frozen. In other words, pages
+ that are <emphasis>just</emphasis> all-visible at the beginning of an
+ aggressive <command>VACUUM</command> must be scanned, not skipped.
+ Scanning existing all-visible pages is necessary to determine the oldest
+ unfrozen XID that will remain in the table at the end of an aggressive
+ <command>VACUUM</command>.
+ </para>
+
+ <note>
+ <para>
+ In practice, most tables require periodic aggressive vacuuming.
+ However, some individual non-aggressive <command>VACUUM</command>
+ operations can advance the table's
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield>.
+ </para>
+ <para>
+ This happens whenever a non-aggressive <command>VACUUM</command> notices
+ that it is safe without incurring any added cost from scanning
+ <quote>extra</quote> pages. It is most common in small, frequently
+ modified tables.
+ </para>
+ </note>
+
+ <para>
+ Non-aggressive <command>VACUUM</command>s can sometimes overlook older
+ XIDs from existing all-visible pages (due to their policy of always
+ skipping all-visible pages). Over time, this can even lead to a
+ significant build-up of unfrozen pages in one table (accumulated
+ all-visible pages that remain unfrozen). When that happens, it is
+ inevitable that an aggressive <command>VACUUM</command> will eventually
+ need to perform <quote>catch-up</quote> freezing that clears the table's
+ backlog of unfrozen pages.
+ </para>
+
+ <para>
+ There is also one <emphasis>minor</emphasis> behavioral difference
+ between aggressive <command>VACUUM</command> and non-aggressive
+ <command>VACUUM</command>: only aggressive <command>VACUUM</command> is
+ required to sometimes wait for a page-level cleanup lock when a page is
+ scanned and observed to contain transaction IDs/Multixact IDs that
+ <emphasis>must</emphasis> be frozen. This difference exists because
+ aggressive <command>VACUUM</command> is strictly required to advance
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> to
+ <emphasis>sufficiently</emphasis> recent values
+ <footnote>
+ <para>
+ Aggressive <command>VACUUM</command> is (somewhat arbitrarily) required
+ to freeze all pages containing transaction IDs older than
+ <varname>vacuum_freeze_min_age</varname> and/or Multixact ID values
+ older than <varname>vacuum_multixact_freeze_min_age</varname>, at a
+ minimum.
+ </para>
+ </footnote>. The behavior can lead to occasional waits for a conflicting
+ buffer pin to be released by another backend. These waits are
+ imperceptible and harmless most of the time. In extreme cases there can
+ be extended waits, which can be observed under the
+ <literal>BufferPin</literal> wait event in the
+ <structname>pg_stat_activity</structname> view. See <xref
+ linkend="wait-event-table"/>.
+ </para>
+
+ <note>
+ <para>
+ <quote>Catch-up</quote> freezing is not caused by any difference in how
+ <varname>vacuum_freeze_min_age</varname> is applied by each type of
+ <command>VACUUM</command>. It is an indirect result of
+ <varname>vacuum_freeze_min_age</varname> only being applied to those
+ pages that <command>VACUUM</command> scans (and cleanup locks) in the
+ first place. Therefore, it can be difficult to tune
+ <varname>vacuum_freeze_min_age</varname>, especially for tables that
+ receive frequent non-aggressive <command>VACUUM</command>s and
+ infrequent aggressive <command>VACUUM</command>s.
+ </para>
+ </note>
+
+ <tip>
+ <para>
+ Autovacuum server log reports <footnoteref
+ linkend="vacuum-autovacuum-log"/> show how many transaction IDs
+ <structfield>relfrozenxid</structfield> advanced by (if at all), and how
+ many Multixact IDs <structfield>relminmxid</structfield> advanced by (if
+ at all).
+ </para>
+ <para>
+ The number of pages frozen, and the number of pages scanned (i.e., the
+ number of pages processed because they could <emphasis>not</emphasis>
+ skipped using the visibility map) are also shown. This can provide
+ useful guidance when tuning freezing-related settings, particularly
+ <varname>vacuum_freeze_table_age</varname> and
+ <varname>vacuum_freeze_min_age</varname>.
+ </para>
+ </tip>
+
+ <para>
+ <xref linkend="guc-vacuum-freeze-table-age"/> controls when
+ <command>VACUUM</command> uses its aggressive strategy. If
+ <literal>age(relfrozenxid)</literal> exceeds
+ <varname>vacuum_freeze_table_age</varname> at the start of
+ <command>VACUUM</command>, <command>VACUUM</command> will employ its
+ aggressive strategy; otherwise, its standard non-aggressive strategy is
+ employed. Setting <varname>vacuum_freeze_table_age</varname> to 0 forces
+ <command>VACUUM</command> to always use its aggressive strategy.
+ </para>
+
+ <para>
+ <xref linkend="guc-vacuum-multixact-freeze-table-age"/> also controls
+ when <command>VACUUM</command> uses its aggressive strategy. This is an
+ independent Multixact ID based trigger for aggressive
+ <command>VACUUM</command>, which works just like
+ <varname>vacuum_freeze_table_age</varname>. It is applied against
+ <literal>mxid_age(relminmxid)</literal> at the start of each
+ <command>VACUUM</command>.
+ </para>
+
+ <para>
+ It doesn't matter if it was <varname>vacuum_freeze_table_age</varname> or
+ <varname>vacuum_multixact_freeze_table_age</varname> that triggered
+ <command>VACUUM</command>'s decision to use its aggressive strategy.
+ <emphasis>Every</emphasis> aggressive <command>VACUUM</command> will
+ advance <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> by applying the same generic policy
+ that controls which pages are frozen.
+ </para>
+
+ <para>
+ The default <varname>vacuum_freeze_table_age</varname> and
+ <varname>vacuum_multixact_freeze_table_age</varname> settings are
+ relatively low values. The <varname>vacuum_freeze_table_age</varname>
+ and <varname>vacuum_freeze_min_age</varname> defaults are intended to
+ limit the system to using only about 10% of the available transaction ID
+ space at any one time. This leaves the system with a generous amount of
+ <quote>slack capacity</quote> that allows XID allocations to continue in
+ the event of unforeseen problems with autovacuum and/or the application.
+ There might only be a negligible benefit from higher settings that aim to
+ reduce the number of <command>VACUUM</command>s that use the aggressive
+ strategy, in any case. Some applications may even
+ <emphasis>benefit</emphasis> from tuning that makes autovacuum perform
+ aggressive <command>VACUUM</command>s more often. If individual
+ aggressive <command>VACUUM</command>s can perform significantly less
+ <quote>catch-up</quote> freezing as a result, overall transaction
+ processing throughput is likely to be more stable and predictable.
+ </para>
+ </sect3>
+
+ <sect3 id="vacuum-antiwraparound-autovacuums">
+ <title>Anti-Wraparound Autovacuums</title>
+
+ <para>
+ To ensure that every table has its
+ <structfield>relfrozenxid</structfield> (and
+ <structfield>relminmxid</structfield>) advanced at regular intervals,
+ even in the case of completely static tables, autovacuum runs against any
+ table when it attains an age considered too far in the past. These are
+ <firstterm>anti-wraparound autovacuums</firstterm>. In practice, all
+ anti-wraparound autovacuums will use <command>VACUUM</command>'s
+ aggressive strategy (if they didn't, it would defeat the whole purpose of
+ anti-wraparound autovacuuming).
+ </para>
+
+ <para>
+ <xref linkend="guc-autovacuum-freeze-max-age"/> controls when the
+ autovacuum daemon launches anti-wraparound autovacuums. If the
+ <literal>age(relfrozenxid)</literal> of a table exceeds
+ <varname>autovacuum_freeze_max_age</varname> when the autovacuum daemon
+ periodically examines the database (which happens once every <xref
+ linkend="guc-autovacuum-naptime"/> seconds), then an anti-wraparound
+ autovacuum is launched against the table.
+ </para>
+
+ <para>
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/> also controls
+ when the autovacuum daemon launches anti-wraparound autovacuums. It is
+ an independent Multixact ID based trigger for anti-wraparound
+ autovacuuming. If the <literal>mxid_age(relminmxid)</literal> of a table
+ exceeds <varname>autovacuum_multixact_freeze_max_age</varname> when the
+ autovacuum daemon periodically examines the database
+ <footnote>
+ <para>
+ Autovacuum may use a lower <quote>effective</quote> Multixact ID age
+ than the <varname>autovacuum_multixact_freeze_max_age</varname> setting
+ in <filename>postgresql.conf</filename>, though. Applying a lower
+ <quote>effective</quote> value like this avoids allowing the
+ <filename>pg_multixact/members</filename> <acronym>SLRU</acronym>
+ storage area to continue to grow in size for long, once its size
+ reaches <literal>2GB</literal>.
+ See <xref linkend="vacuum-truncate-xact-status"/>.
+ </para>
+ </footnote>, then an anti-wraparound autovacuum is launched against the
+ table.
+ </para>
+
+ <para>
+ Use of <command>VACUUM</command>'s aggressive strategy during
+ anti-wraparound autovacuuming is certain, because the effective value of
+ <varname>vacuum_freeze_table_age</varname> is silently limited to an
+ effective value no greater than 95% of the current value of
+ <varname>autovacuum_freeze_max_age</varname>. Similarly, the effective
+ value of <varname>vacuum_multixact_freeze_table_age</varname> is silently
+ limited to a value no greater than 95% of the current value of
+ <varname>autovacuum_multixact_freeze_max_age</varname>.
+ </para>
+
+ <para>
+ It doesn't matter if it was <varname>autovacuum_freeze_max_age</varname>
+ or <varname>autovacuum_multixact_freeze_max_age</varname> that triggered
+ an anti-wraparound autovacuum. <emphasis>Every</emphasis>
+ anti-wraparound autovacuum will be an aggressive
+ <command>VACUUM</command>, and will therefore advance
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> by applying the same generic policy
+ that controls which pages are frozen.
+ </para>
+
+ <para>
+ Anti-wraparound autovacuums are intended for static (and mostly static)
+ tables. There is no reason to expect that a table receiving continual
+ row inserts and/or row modifications will ever require an anti-wraparound
+ autovacuum. As a rule of thumb,
+ <varname>autovacuum_freeze_max_age</varname> should be set to a value
+ somewhat above <varname>vacuum_freeze_table_age</varname>, so that there
+ is a long window during which any autovacuum triggered by inserts,
+ updates, or deletes (or any manually issued <command>VACUUM</command>)
+ will become an aggressive <command>VACUUM</command>. This has the
+ advantage of allowing aggressive vacuuming to take place at a time when
+ vacuuming was required anyway. Each aggressive <command>VACUUM</command>
+ can therefore be expected to perform just as much useful work on
+ recovering disk space as an equivalent non-aggressive
+ <command>VACUUM</command> would have (had the non-aggressive strategy
+ been chosen instead).
+ </para>
+
+ <note>
+ <title>Aggressive/anti-wraparound differences</title>
+ <para>
+ Aggressive <command>VACUUM</command> is a special type of
+ <command>VACUUM</command>. It must advance
+ <structfield>relfrozenxid</structfield> up to a value that was no
+ greater than <varname>vacuum_freeze_min_age</varname> in age as of the
+ <emphasis>start</emphasis> of the <command>VACUUM</command> operation.
+ </para>
+ <para>
+ Anti-wraparound autovacuum is a special type of autovacuum. Its purpose
+ is to ensure that <structfield>relfrozenxid</structfield> advances when
+ no earlier <command>VACUUM</command> could advance it in passing —
+ often because no <command>VACUUM</command> has run against the table for
+ an extended period.
+ </para>
+ <para>
+ There is only one runtime behavioral difference between anti-wraparound
+ autovacuums and other autovacuums that run aggressive
+ <command>VACUUM</command>s: anti-wraparound autovacuums <emphasis>cannot
+ be autocancelled</emphasis>. This means that autovacuum workers that
+ perform anti-wraparound autovacuuming do not yield to conflicting
+ relation-level lock requests (e.g., from <command>ALTER
+ TABLE</command>). See <xref linkend="autovacuum-lock-conflicts"/> for
+ a full explanation.
+ </para>
+ </note>
+
+ <para>
+ In practice, anti-wraparound autovacuum is very likely to be the type of
+ autovacuum that updates the oldest <structfield>relfrozenxid</structfield>
+ in each database to a more recent value due to the presence of completely
+ static tables <footnote>
+ <para>
+ Anti-wraparound autovacuum is all but guaranteed to advance the oldest
+ <structfield>relfrozenxid</structfield>/<structfield>relminmxid</structfield>
+ (and therefore to advance
+ <structfield>datfrozenxid</structfield>/<structfield>datminmxid</structfield>)
+ in practice because in practice there is all but guaranteed to be at
+ least one totally static table that never gets an aggressive
+ <command>VACUUM</command> for any other reason (often just a tiny,
+ completely static system catalog table).
+ </para>
+ </footnote>. As discussed in <xref linkend="vacuum-aggressive"/>,
+ <structfield>datfrozenxid</structfield> only advances when the oldest
+ <structfield>relfrozenxid</structfield> in the database advances
+ (<structfield>relminmxid</structfield> likewise only advances when the
+ earliest <structfield>datminmxid</structfield> in the database advances).
+ This implies that anti-wraparound autovacuum is <emphasis>also</emphasis>
+ very likely to be involved when any database's
+ <structfield>datfrozenxid</structfield>/<structfield>datminmxid</structfield>
+ advances (and when the cluster-wide earliest unfrozen transaction
+ ID/Multixact ID is advanced to a more recent value, in turn).
+ </para>
+
+ <para>
+ It follows that <varname>autovacuum_freeze_max_age</varname> is usually
+ the limiting factor for advancing the cluster-wide oldest unfrozen
+ transaction ID found in <link linkend="functions-info-controldata"><filename>pg_control</filename></link>
+ (the cluster-wide oldest unfrozen Multixact ID might occasionally be
+ influenced by <varname>autovacuum_multixact_freeze_max_age</varname>,
+ too). This usually isn't much of a concern in itself, since it generally
+ doesn't predict anything about how far behind autovacuum is with freezing
+ physical heap pages. Note, however, that this effect
+ <emphasis>can</emphasis> significantly impact the amount of space
+ required to store transaction status information. The oldest transaction
+ status information (which is stored in external structures such as
+ <filename>pg_xact</filename>) cannot safely be truncated until
+ <command>VACUUM</command> can ascertain that there are no references to
+ the oldest entries remaining in any table, from any database. See <xref
+ linkend="vacuum-truncate-xact-status"/> for further details.
+ </para>
+ </sect3>
+
+ <sect3 id="vacuum-xid-exhaustion">
+ <title>Transaction ID exhaustion</title>
+ <para>
+ If for some reason autovacuum fails to advance any table's
+ <structfield>relfrozenxid</structfield> for an extended period (during
+ which transaction IDs continue to be allocated), the system will begin to
+ emit warning messages once the database's oldest XIDs attain an age
+ within forty million transactions of the 2.1 billion XID hard limit
+ described in <xref linkend="vacuum-aggressive"/>. For example:
<programlisting>
WARNING: database "mydb" must be vacuumed within 39985967 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
</programlisting>
- (A manual <command>VACUUM</command> should fix the problem, as suggested by the
- hint; but note that the <command>VACUUM</command> must be performed by a
- superuser, else it will fail to process system catalogs and thus not
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
- If these warnings are
- ignored, the system will shut down and refuse to start any new
- transactions once there are fewer than three million transactions left
- until wraparound:
+ (A manual <command>VACUUM</command> should fix the problem, as suggested by the
+ hint; but note that the <command>VACUUM</command> must be performed by a
+ superuser, else it will fail to process system catalogs and thus not
+ be able to advance the database's <structfield>datfrozenxid</structfield>.)
+ If these warnings are ignored, the system will eventually refuse
+ to allocate new transaction IDs. This happens at the point that
+ there are fewer than three million transactions left:
<programlisting>
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
</programlisting>
- The three-million-transaction safety margin exists to let the
- administrator recover without data loss, by manually executing the
- required <command>VACUUM</command> commands. However, since the system will not
- execute commands once it has gone into the safety shutdown mode,
- the only way to do this is to stop the server and start the server in single-user
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
- in single-user mode. See the <xref linkend="app-postgres"/> reference
- page for details about using single-user mode.
- </para>
-
- <sect3 id="vacuum-for-multixact-wraparound">
- <title>Multixacts and Wraparound</title>
-
- <indexterm>
- <primary>MultiXactId</primary>
- </indexterm>
-
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of multixact IDs</secondary>
- </indexterm>
-
- <para>
- <firstterm>Multixact IDs</firstterm> are used to support row locking by
- multiple transactions. Since there is only limited space in a tuple
- header to store lock information, that information is encoded as
- a <quote>multiple transaction ID</quote>, or multixact ID for short,
- whenever there is more than one transaction concurrently locking a
- row. Information about which transaction IDs are included in any
- particular multixact ID is stored separately in
- the <filename>pg_multixact</filename> subdirectory, and only the multixact ID
- appears in the <structfield>xmax</structfield> field in the tuple header.
- Like transaction IDs, multixact IDs are implemented as a
- 32-bit counter and corresponding storage, all of which requires
- careful aging management, storage cleanup, and wraparound handling.
- There is a separate storage area which holds the list of members in
- each multixact, which also uses a 32-bit counter and which must also
- be managed.
+ The three-million-transaction safety margin exists to let the
+ administrator recover without data loss, by manually executing the
+ required <command>VACUUM</command> commands. However, since the system will not
+ execute commands once it has gone into the safety shutdown mode,
+ the only way to do this is to stop the server and start the server in single-user
+ mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
+ in single-user mode. See the <xref linkend="app-postgres"/> reference
+ page for details about using single-user mode.
</para>
<para>
- Whenever <command>VACUUM</command> scans any part of a table, it will replace
- any multixact ID it encounters which is older than
- <xref linkend="guc-vacuum-multixact-freeze-min-age"/>
- by a different value, which can be the zero value, a single
- transaction ID, or a newer multixact ID. For each table,
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> stores the oldest
- possible multixact ID still appearing in any tuple of that table.
- If this value is older than
- <xref linkend="guc-vacuum-multixact-freeze-table-age"/>, an aggressive
- vacuum is forced. As discussed in the previous section, an aggressive
- vacuum means that only those pages which are known to be all-frozen will
- be skipped. <function>mxid_age()</function> can be used on
- <structname>pg_class</structname>.<structfield>relminmxid</structfield> to find its age.
+ A similar safety mechanism is used to prevent Multixact ID allocations
+ when any table's <structfield>relminmxid</structfield> is dangerously far
+ in the past: Multixact ID exhaustion. If the system isn't experiencing
+ transaction ID exhaustion, Multixact ID exhaustion can be fixed
+ non-invasively by running a manual <command>VACUUM</command> without
+ entering single-user mode (otherwise follow the procedure for transaction
+ ID exhaustion). See <xref linkend="monitoring-table-age"/> for details
+ on how to determine which table's <structfield>relminmxid</structfield>
+ is dangerously far in the past.
</para>
- <para>
- Aggressive <command>VACUUM</command>s, regardless of what causes
- them, are <emphasis>guaranteed</emphasis> to be able to advance
- the table's <structfield>relminmxid</structfield>.
- Eventually, as all tables in all databases are scanned and their
- oldest multixact values are advanced, on-disk storage for older
- multixacts can be removed.
- </para>
-
- <para>
- As a safety device, an aggressive vacuum scan will
- occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds 2GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled.
- </para>
+ <note>
+ <para>
+ Autovacuum has two different mechanisms that are designed to avoid
+ transaction ID exhaustion. The first mechansim is anti-wraparound
+ autovacuuming. There is an second, independent mechanism, used when
+ <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> have already consumed a
+ significant fraction of the total available transaction ID space: the
+ failsafe.
+ </para>
+ <para>
+ The failsafe is triggered by <command>VACUUM</command> when the table's
+ <structfield>relfrozenxid</structfield> attains an age of <xref
+ linkend="guc-vacuum-failsafe-age"/> XIDs, or when the table's
+ <structfield>relminmxid</structfield> attains an age of <xref
+ linkend="guc-vacuum-multixact-failsafe-age"/> Multixact IDs. This
+ happens dynamically, when the risk of eventual transaction ID (or
+ Multixact ID) exhaustion is deemed to outweigh the risks of not
+ proceeding as planned with ordinary vacuuming.
+ </para>
+ <para>
+ Once the failsafe triggers, <command>VACUUM</command> prioritizes
+ advancing <structfield>relfrozenxid</structfield> and/or
+ <structfield>relminmxid</structfield> to avoid transaction ID
+ exhaustion. Most notably, <command>VACUUM</command> bypasses any
+ remaining non-essential maintenance, such as index vacuuming.
+ </para>
+ </note>
</sect3>
</sect2>
@@ -784,9 +1239,19 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
table to keep track of which pages contain only tuples that are known to be
visible to all active transactions (and all future transactions, until the
- page is again modified). This has two purposes. First, vacuum
- itself can skip such pages on the next run, since there is nothing to
- clean up.
+ page is again modified). A separate bit tracks whether all of the tuples
+ are frozen.
+ </para>
+
+ <para>
+ The visibility map serves two purposes.
+ </para>
+
+ <para>
+ First, <command>VACUUM</command> itself can skip such pages on the
+ next run, since there is nothing to clean up. Even <link
+ linkend="vacuum-aggressive">aggressive <command>VACUUM</command>s</link>
+ can skip pages that are both all-visible and all-frozen.
</para>
<para>
@@ -806,6 +1271,79 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect2>
+ <sect2 id="vacuum-truncate-xact-status">
+ <title>Truncating Transaction Status Information</title>
+
+ <para>
+ As discussed in <xref linkend="vacuum-aggressive"/>, aggressive
+ autovacuuming plays a critical role in maintaining the XID address space
+ for the system as a whole. A secondary goal of this whole process is to
+ enable eventual truncation of the oldest transaction status information in
+ the <glossterm linkend="glossary-db-cluster">database cluster</glossterm>
+ as a whole. This status information is stored in dedicated <link
+ linkend="monitoring-pg-stat-slru-view">simple least-recently-used</link>
+ (<acronym>SLRU</acronym>) caches backed by external storage (see <xref
+ linkend="storage-file-layout"/>). Truncation is only possible when
+ <command>VACUUM</command> can ascertain that there are no references to
+ the oldest entries remaining in any table, from any database, by taking
+ the earliest <structfield>datfrozenxid</structfield> and
+ <structfield>datminmxid</structfield> among all databases in the cluster.
+ This isn't a maintenance task that affects individual tables; it's a
+ maintenance task that affects the whole cluster.
+ </para>
+
+ <para>
+ The space required to store transaction status information is likely to be
+ a low priority for most database administrators. It may occasionally be
+ useful to limit the maximum storage overhead used for transaction status
+ information by making anti-wraparound autovacuums happen more frequently.
+ The frequency of system-wide anti-wraparound autovacuuming increases when
+ <varname>autovacuum_freeze_max_age</varname> and
+ <varname>autovacuum_multixact_freeze_max_age</varname> are decreased in
+ <filename>postgresql.conf</filename>. This approach is effective (at
+ limiting the storage required for transaction status information) because
+ the <emphasis>oldest</emphasis> <structfield>datfrozenxid</structfield>
+ and <structfield>datminmxid</structfield> in the cluster are very likely
+ to depend on the frequency of anti-wraparound autovacuuming of completely
+ static tables. See <xref linkend="vacuum-antiwraparound-autovacuums"/> for
+ further discussion of the role of anti-wraparound autovacuuming in
+ advancing the cluster-wide oldest unfrozen transaction ID.
+ </para>
+
+ <para>
+ There are two <acronym>SLRU</acronym> storage areas associated with
+ transaction IDs. First, there is <filename>pg_xact</filename>, which
+ stores commit/abort status information. Second, there is
+ <filename>pg_commit_ts</filename>, which stores transaction commit
+ timestamps (when <xref linkend="guc-track-commit-timestamp"/> is set to
+ <literal>on</literal>). The default
+ <varname>autovacuum_freeze_max_age</varname> setting of 200 million
+ transactions translates to about 50MB of <filename>pg_xact</filename>
+ storage, and about 2GB of <filename>pg_commit_ts</filename> storage when
+ <varname>track_commit_timestamp</varname> is enabled (it is set to
+ <literal>off</literal> by default, which totally avoids the need to store
+ anything in <filename>pg_commit_ts</filename>).
+ </para>
+
+ <para>
+ There are also two <acronym>SLRU</acronym> storage areas associated with
+ Multixact IDs: <filename>pg_multixact/members</filename>, and
+ <filename>pg_multixact/offsets</filename>. These are logically one
+ storage area, implemented as two distinct storage areas. There is no
+ simple formula to determine the storage overhead per Multixact ID, since
+ Multixact IDs have a variable number of member transaction IDs (this is
+ what necessitates using two different physical storage areas). Note,
+ however, that if <filename>pg_multixact/members</filename> exceeds 2GB,
+ the effective value of <varname>autovacuum_multixact_freeze_max_age</varname>
+ used by autovacuum (and <command>VACUUM</command>) will be lower. This
+ results in more frequent
+ <link linkend="vacuum-antiwraparound-autovacuums">anti-wraparound
+ autovacuums</link> (since that's the only approach that reliably limits
+ the size of these storage areas). It might also increase the frequency of
+ aggressive <command>VACUUM</command>s more generally.
+ </para>
+ </sect2>
+
<sect2 id="vacuum-for-statistics">
<title>Updating Planner Statistics</title>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 99f7f95c3..0888342ef 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1032,9 +1032,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<row>
<entry><literal>BufferPin</literal></entry>
<entry>The server process is waiting for exclusive access to
- a data buffer. Buffer pin waits can be protracted if
- another process holds an open cursor that last read data from the
- buffer in question. See <xref linkend="wait-event-bufferpin-table"/>.
+ a data buffer. Buffer pin waits by aggressive
+ <command>VACUUM</command> (see <xref linkend="vacuum-aggressive"/>) can
+ be protracted if another process holds an for details on how to open
+ cursor that last read data from the buffer in question.
+ See <xref linkend="wait-event-bufferpin-table"/>.
</entry>
</row>
<row>
@@ -6246,6 +6248,78 @@ FROM pg_stat_get_backend_idset() AS backendid;
</sect2>
</sect1>
+ <sect1 id="monitoring-table-age">
+ <title>Monitoring table age</title>
+
+ <para>
+ It is crucial that autovacuum is able to run <command>VACUUM</command> (or
+ that every table is manually vacuumed) at somewhat regular intervals. Even
+ completely static tables need to participate in management of the
+ transaction ID space by <command>VACUUM</command>, as explained in <xref
+ linkend="vacuum-freezing-xid-space"/>.
+ </para>
+
+ <para>
+ This section provides details about how to monitor for this. You might
+ want to use a third party monitoring and alterting tool for this. Many
+ have off-the-shelf queries similar to the reference query shown here.
+ </para>
+
+ <caution>
+ <para>
+ Temporary tables are not maintained by autovacuum (see <xref
+ linkend="autovacuum-limitations"/>), but nevertheless have the same
+ requirements for freezing and <structfield>relfrozenxid</structfield>
+ advancement as permanent tables.
+ </para>
+ <para>
+ This means application code that <quote>leaks</quote> temporary tables can
+ threaten the availability of the system; eventually, the system will enter
+ a mode that makes it temporarily unable to allocate new transaction IDs
+ (see <xref linkend="vacuum-xid-exhaustion"/>), since autovacuum's usual
+ strategies for preventing that from happening cannot be used.
+ </para>
+ </caution>
+
+ <para>
+ A convenient way to examine information about
+ <structfield>relfrozenxid</structfield> and
+ <structfield>relminmxid</structfield> is to execute queries such as:
+
+<programlisting>
+SELECT c.oid::regclass as table_name,
+greatest(age(c.relfrozenxid),
+ age(t.relfrozenxid)) as xid_age,
+mxid_age(c.relminmxid)
+FROM pg_class c
+LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
+WHERE c.relkind IN ('r', 'm');
+
+SELECT datname,
+age(datfrozenxid) as xid_age,
+mxid_age(datminmxid)
+FROM pg_database;
+</programlisting>
+
+ The <function>age</function> function returns the number of transactions
+ from <structfield>relfrozenxid</structfield> to the next unallocated
+ transaction ID. The <function>mxid_age</function> function returns the
+ number of Multixact IDs from <structfield>relminmxid</structfield> to the
+ next unallocated Multixact ID.
+ </para>
+
+ <para>
+ The system should always have significant XID allocation slack capacity.
+ Ideally, the greatest
+ <literal>age(relfrozenxid)</literal>/<literal>age(datfrozenxid)</literal>
+ in the system will never be more than a fraction of the 2.1 billion XID
+ hard limit described in <xref linkend="vacuum-aggressive"/>. The default
+ <varname>vacuum_freeze_table_age</varname> setting of 200 million
+ transactions implies that the system should never use significantly more
+ than about 10% of that hard limit.
+ </para>
+ </sect1>
+
<sect1 id="monitoring-locks">
<title>Viewing Locks</title>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fa..307504acb 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1514,11 +1514,10 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
If true, the autovacuum daemon will perform automatic <command>VACUUM</command>
and/or <command>ANALYZE</command> operations on this table following the rules
discussed in <xref linkend="autovacuum"/>.
- If false, this table will not be autovacuumed, except to prevent
- transaction ID wraparound. See <xref linkend="vacuum-for-wraparound"/> for
- more about wraparound prevention.
- Note that the autovacuum daemon does not run at all (except to prevent
- transaction ID wraparound) if the <xref linkend="guc-autovacuum"/>
+ If false, this table will not be autovacuumed, except to run
+ anti-wraparound autovacuum (see <xref linkend="vacuum-antiwraparound-autovacuums"/>).
+ Note that the autovacuum daemon does not run at all (except to run
+ anti-wraparound autovacuums) if the <xref linkend="guc-autovacuum"/>
parameter is false; setting individual tables' storage parameters does
not override that. Therefore there is seldom much point in explicitly
setting this storage parameter to <literal>true</literal>, only
diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac..719e0b25a 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -126,13 +126,13 @@ PREPARE TRANSACTION <replaceable class="parameter">transaction_id</replaceable>
<para>
It is unwise to leave transactions in the prepared state for a long time.
This will interfere with the ability of <command>VACUUM</command> to reclaim
- storage, and in extreme cases could cause the database to shut down
- to prevent transaction ID wraparound (see <xref
- linkend="vacuum-for-wraparound"/>). Keep in mind also that the transaction
- continues to hold whatever locks it held. The intended usage of the
- feature is that a prepared transaction will normally be committed or
- rolled back as soon as an external transaction manager has verified that
- other databases are also prepared to commit.
+ storage, and in extreme cases could cause the database to refuse to
+ allocate new transaction IDs (see <xref linkend="vacuum-xid-exhaustion"/>).
+ Keep in mind also that the transaction continues to hold whatever locks it
+ held. The intended usage of the feature is that a prepared transaction
+ will normally be committed or rolled back as soon as an external
+ transaction manager has verified that other databases are also prepared to
+ commit.
</para>
<para>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 57bc4c23e..f8c010123 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -123,7 +123,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<term><literal>FREEZE</literal></term>
<listitem>
<para>
- Selects aggressive <quote>freezing</quote> of tuples.
+ Makes <quote>freezing</quote> <emphasis>maximally</emphasis>
+ aggressive, and forces <command>VACUUM</command> to use its
+ <link linkend="vacuum-aggressive">aggressive strategy</link>.
Specifying <literal>FREEZE</literal> is equivalent to performing
<command>VACUUM</command> with the
<xref linkend="guc-vacuum-freeze-min-age"/> and
@@ -218,11 +220,11 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<emphasis>always</emphasis> skip index vacuuming, even when
there are many dead tuples in the table. This may be useful
when it is necessary to make <command>VACUUM</command> run as
- quickly as possible to avoid imminent transaction ID wraparound
- (see <xref linkend="vacuum-for-wraparound"/>). However, the
- wraparound failsafe mechanism controlled by <xref
+ quickly as possible to avoid imminent transaction ID exhaustion
+ (see <xref linkend="vacuum-xid-exhaustion"/>). However, the failsafe
+ mechanism controlled by <xref
linkend="guc-vacuum-failsafe-age"/> will generally trigger
- automatically to avoid transaction ID wraparound failure, and
+ automatically to avoid transaction ID exhaustion failure, and
should be preferred. If index cleanup is not performed
regularly, performance may suffer, because as the table is
modified indexes will accumulate dead tuples and the table
@@ -232,7 +234,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<para>
This option has no effect for tables that have no index and is
ignored if the <literal>FULL</literal> option is used. It also
- has no effect on the transaction ID wraparound failsafe
+ has no effect on the transaction ID exhaustion failsafe
mechanism. When triggered it will skip index vacuuming, even
when <literal>INDEX_CLEANUP</literal> is set to
<literal>ON</literal>.
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index da2393783..9621fe378 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -231,9 +231,11 @@ PostgreSQL documentation
<para>
Only execute the vacuum or analyze commands on tables with a multixact
ID age of at least <replaceable class="parameter">mxid_age</replaceable>.
- This setting is useful for prioritizing tables to process to prevent
- multixact ID wraparound (see
- <xref linkend="vacuum-for-multixact-wraparound"/>).
+ This setting is useful for prioritizing tables to advance
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
+ for via an aggressive <command>VACUUM</command>
+ (see <xref linkend="vacuum-aggressive"/> and
+ <xref linkend="monitoring-table-age"/>).
</para>
<para>
For the purposes of this option, the multixact ID age of a relation is
@@ -252,9 +254,12 @@ PostgreSQL documentation
<para>
Only execute the vacuum or analyze commands on tables with a
transaction ID age of at least
- <replaceable class="parameter">xid_age</replaceable>. This setting
- is useful for prioritizing tables to process to prevent transaction
- ID wraparound (see <xref linkend="vacuum-for-wraparound"/>).
+ <replaceable class="parameter">xid_age</replaceable>. This
+ setting is useful for prioritizing tables to advance
+ <structname>pg_class</structname>.<structfield>relminmxid</structfield>
+ for via an aggressive <command>VACUUM</command>
+ (see <xref linkend="vacuum-aggressive"/> and
+ <xref linkend="monitoring-table-age"/>).
</para>
<para>
For the purposes of this option, the transaction ID age of a relation
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 148fb1b49..6d7ddc685 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -664,7 +664,8 @@ This information can also be used
by <link linkend="indexes-index-only-scans"><firstterm>index-only
scans</firstterm></link> to answer queries using only the index tuple.
The second bit, if set, means that all tuples on the page have been frozen.
-That means that even an anti-wraparound vacuum need not revisit the page.
+That means that even an aggressive vacuum (see <xref linkend="vacuum-aggressive"/>)
+need not revisit the page.
</para>
<para>
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index 8a1f9fd6f..3cba050f0 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -180,7 +180,7 @@
rows and can be inspected using the <xref linkend="pgrowlocks"/>
extension. Row-level read locks might also require the assignment
of multixact IDs (<literal>mxid</literal>; see <xref
- linkend="vacuum-for-multixact-wraparound"/>).
+ linkend="vacuum-freezing-xid-space"/>).
</para>
</sect1>
--
2.40.1
v4-0004-Reorder-routine-vacuuming-sections.patchapplication/octet-stream; name=v4-0004-Reorder-routine-vacuuming-sections.patchDownload
From de7e1aaead4a8f8b4a680b7489a35fedad8059d2 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:19:50 -0700
Subject: [PATCH v4 4/9] Reorder routine vacuuming sections.
This doesn't change any of the content itself. It is a mechanical
change. The new order talks about maintenance tasks that happen within
the scope of the VACUUM command first, and then talks about ANALYZE
last. Furthermore, we talk about each maintenance task that happens
within the scope of VACUUM in an order that matches physical processing
order within vacuumlazy.c. (If you assume that "space-recovery" mostly
deals with pruning and "for-wraparound" mostly deals with freezing).
Old order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-statistics">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-wraparound">
New order:
<sect2 id="vacuum-basics">
<sect2 id="vacuum-for-space-recovery">
<sect2 id="vacuum-for-wraparound">
<sect2 id="vacuum-for-visibility-map">
<sect2 id="vacuum-for-statistics">
A later commit will make the content that now appears in "vacuum-basics"
appear as the "Routine Vacuuming" sect1's introductory paragraph.
That'll make it easier to move advice about when to use VACUUM FULL to
some other chapter (since it isn't intended for "routine" use at all).
The new order will be easier to work with in later commits that overhaul
both "space-recovery" and "for-wraparound". Pruning and freezing are
related conceptually (e.g., holding back "removable cutoff"/OldestXmin
disrupts both in about the same way), which will be easier to discuss
with this ground work in place.
---
doc/src/sgml/maintenance.sgml | 300 +++++++++++++++++-----------------
1 file changed, 150 insertions(+), 150 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 702e2797c..83fa7ba8b 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -280,8 +280,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
</listitem>
<listitem>
@@ -291,9 +292,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
<listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
</listitem>
</orderedlist>
@@ -438,151 +438,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</tip>
</sect2>
- <sect2 id="vacuum-for-statistics">
- <title>Updating Planner Statistics</title>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>statistics</primary>
- <secondary>of the planner</secondary>
- </indexterm>
-
- <indexterm zone="vacuum-for-statistics">
- <primary>ANALYZE</primary>
- </indexterm>
-
- <para>
- The <productname>PostgreSQL</productname> query planner relies on
- statistical information about the contents of tables in order to
- generate good plans for queries. These statistics are gathered by
- the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
- which can be invoked by itself or
- as an optional step in <command>VACUUM</command>. It is important to have
- reasonably accurate statistics, otherwise poor choices of plans might
- degrade database performance.
- </para>
-
- <para>
- The autovacuum daemon, if enabled, will automatically issue
- <command>ANALYZE</command> commands whenever the content of a table has
- changed sufficiently. However, administrators might prefer to rely
- on manually-scheduled <command>ANALYZE</command> operations, particularly
- if it is known that update activity on a table will not affect the
- statistics of <quote>interesting</quote> columns. The daemon schedules
- <command>ANALYZE</command> strictly as a function of the number of rows
- inserted or updated; it has no knowledge of whether that will lead
- to meaningful statistical changes.
- </para>
-
- <para>
- Tuples changed in partitions and inheritance children do not trigger
- analyze on the parent table. If the parent table is empty or rarely
- changed, it may never be processed by autovacuum, and the statistics for
- the inheritance tree as a whole won't be collected. It is necessary to
- run <command>ANALYZE</command> on the parent table manually in order to
- keep the statistics up to date.
- </para>
-
- <para>
- As with vacuuming for space recovery, frequent updates of statistics
- are more useful for heavily-updated tables than for seldom-updated
- ones. But even for a heavily-updated table, there might be no need for
- statistics updates if the statistical distribution of the data is
- not changing much. A simple rule of thumb is to think about how much
- the minimum and maximum values of the columns in the table change.
- For example, a <type>timestamp</type> column that contains the time
- of row update will have a constantly-increasing maximum value as
- rows are added and updated; such a column will probably need more
- frequent statistics updates than, say, a column containing URLs for
- pages accessed on a website. The URL column might receive changes just
- as often, but the statistical distribution of its values probably
- changes relatively slowly.
- </para>
-
- <para>
- It is possible to run <command>ANALYZE</command> on specific tables and even
- just specific columns of a table, so the flexibility exists to update some
- statistics more frequently than others if your application requires it.
- In practice, however, it is usually best to just analyze the entire
- database, because it is a fast operation. <command>ANALYZE</command> uses a
- statistically random sampling of the rows of a table rather than reading
- every single row.
- </para>
-
- <tip>
- <para>
- Although per-column tweaking of <command>ANALYZE</command> frequency might not be
- very productive, you might find it worthwhile to do per-column
- adjustment of the level of detail of the statistics collected by
- <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
- clauses and have highly irregular data distributions might require a
- finer-grain data histogram than other columns. See <command>ALTER TABLE
- SET STATISTICS</command>, or change the database-wide default using the <xref
- linkend="guc-default-statistics-target"/> configuration parameter.
- </para>
-
- <para>
- Also, by default there is limited information available about
- the selectivity of functions. However, if you create a statistics
- object or an expression
- index that uses a function call, useful statistics will be
- gathered about the function, which can greatly improve query
- plans that use the expression index.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands for
- foreign tables, since it has no means of determining how often that
- might be useful. If your queries require statistics on foreign tables
- for proper planning, it's a good idea to run manually-managed
- <command>ANALYZE</command> commands on those tables on a suitable schedule.
- </para>
- </tip>
-
- <tip>
- <para>
- The autovacuum daemon does not issue <command>ANALYZE</command> commands
- for partitioned tables. Inheritance parents will only be analyzed if the
- parent itself is changed - changes to child tables do not trigger
- autoanalyze on the parent table. If your queries require statistics on
- parent tables for proper planning, it is necessary to periodically run
- a manual <command>ANALYZE</command> on those tables to keep the statistics
- up to date.
- </para>
- </tip>
-
- </sect2>
-
- <sect2 id="vacuum-for-visibility-map">
- <title>Updating the Visibility Map</title>
-
- <para>
- Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
- table to keep track of which pages contain only tuples that are known to be
- visible to all active transactions (and all future transactions, until the
- page is again modified). This has two purposes. First, vacuum
- itself can skip such pages on the next run, since there is nothing to
- clean up.
- </para>
-
- <para>
- Second, it allows <productname>PostgreSQL</productname> to answer some
- queries using only the index, without reference to the underlying table.
- Since <productname>PostgreSQL</productname> indexes don't contain tuple
- visibility information, a normal index scan fetches the heap tuple for each
- matching index entry, to check whether it should be seen by the current
- transaction.
- An <link linkend="indexes-index-only-scans"><firstterm>index-only
- scan</firstterm></link>, on the other hand, checks the visibility map first.
- If it's known that all tuples on the page are
- visible, the heap fetch can be skipped. This is most useful on
- large data sets where the visibility map can prevent disk accesses.
- The visibility map is vastly smaller than the heap, so it can easily be
- cached even when the heap is very large.
- </para>
- </sect2>
-
<sect2 id="vacuum-for-wraparound">
<title>Preventing Transaction ID Wraparound Failures</title>
@@ -932,6 +787,151 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
</para>
</sect3>
</sect2>
+
+ <sect2 id="vacuum-for-visibility-map">
+ <title>Updating the Visibility Map</title>
+
+ <para>
+ Vacuum maintains a <link linkend="storage-vm">visibility map</link> for each
+ table to keep track of which pages contain only tuples that are known to be
+ visible to all active transactions (and all future transactions, until the
+ page is again modified). This has two purposes. First, vacuum
+ itself can skip such pages on the next run, since there is nothing to
+ clean up.
+ </para>
+
+ <para>
+ Second, it allows <productname>PostgreSQL</productname> to answer some
+ queries using only the index, without reference to the underlying table.
+ Since <productname>PostgreSQL</productname> indexes don't contain tuple
+ visibility information, a normal index scan fetches the heap tuple for each
+ matching index entry, to check whether it should be seen by the current
+ transaction.
+ An <link linkend="indexes-index-only-scans"><firstterm>index-only
+ scan</firstterm></link>, on the other hand, checks the visibility map first.
+ If it's known that all tuples on the page are
+ visible, the heap fetch can be skipped. This is most useful on
+ large data sets where the visibility map can prevent disk accesses.
+ The visibility map is vastly smaller than the heap, so it can easily be
+ cached even when the heap is very large.
+ </para>
+ </sect2>
+
+ <sect2 id="vacuum-for-statistics">
+ <title>Updating Planner Statistics</title>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>statistics</primary>
+ <secondary>of the planner</secondary>
+ </indexterm>
+
+ <indexterm zone="vacuum-for-statistics">
+ <primary>ANALYZE</primary>
+ </indexterm>
+
+ <para>
+ The <productname>PostgreSQL</productname> query planner relies on
+ statistical information about the contents of tables in order to
+ generate good plans for queries. These statistics are gathered by
+ the <link linkend="sql-analyze"><command>ANALYZE</command></link> command,
+ which can be invoked by itself or
+ as an optional step in <command>VACUUM</command>. It is important to have
+ reasonably accurate statistics, otherwise poor choices of plans might
+ degrade database performance.
+ </para>
+
+ <para>
+ The autovacuum daemon, if enabled, will automatically issue
+ <command>ANALYZE</command> commands whenever the content of a table has
+ changed sufficiently. However, administrators might prefer to rely
+ on manually-scheduled <command>ANALYZE</command> operations, particularly
+ if it is known that update activity on a table will not affect the
+ statistics of <quote>interesting</quote> columns. The daemon schedules
+ <command>ANALYZE</command> strictly as a function of the number of rows
+ inserted or updated; it has no knowledge of whether that will lead
+ to meaningful statistical changes.
+ </para>
+
+ <para>
+ Tuples changed in partitions and inheritance children do not trigger
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum, and the statistics for
+ the inheritance tree as a whole won't be collected. It is necessary to
+ run <command>ANALYZE</command> on the parent table manually in order to
+ keep the statistics up to date.
+ </para>
+
+ <para>
+ As with vacuuming for space recovery, frequent updates of statistics
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there might be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
+ the minimum and maximum values of the columns in the table change.
+ For example, a <type>timestamp</type> column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column might receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
+ </para>
+
+ <para>
+ It is possible to run <command>ANALYZE</command> on specific tables and even
+ just specific columns of a table, so the flexibility exists to update some
+ statistics more frequently than others if your application requires it.
+ In practice, however, it is usually best to just analyze the entire
+ database, because it is a fast operation. <command>ANALYZE</command> uses a
+ statistically random sampling of the rows of a table rather than reading
+ every single row.
+ </para>
+
+ <tip>
+ <para>
+ Although per-column tweaking of <command>ANALYZE</command> frequency might not be
+ very productive, you might find it worthwhile to do per-column
+ adjustment of the level of detail of the statistics collected by
+ <command>ANALYZE</command>. Columns that are heavily used in <literal>WHERE</literal>
+ clauses and have highly irregular data distributions might require a
+ finer-grain data histogram than other columns. See <command>ALTER TABLE
+ SET STATISTICS</command>, or change the database-wide default using the <xref
+ linkend="guc-default-statistics-target"/> configuration parameter.
+ </para>
+
+ <para>
+ Also, by default there is limited information available about
+ the selectivity of functions. However, if you create a statistics
+ object or an expression
+ index that uses a function call, useful statistics will be
+ gathered about the function, which can greatly improve query
+ plans that use the expression index.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ foreign tables, since it has no means of determining how often that
+ might be useful. If your queries require statistics on foreign tables
+ for proper planning, it's a good idea to run manually-managed
+ <command>ANALYZE</command> commands on those tables on a suitable schedule.
+ </para>
+ </tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands
+ for partitioned tables. Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table. If your queries require statistics on
+ parent tables for proper planning, it is necessary to periodically run
+ a manual <command>ANALYZE</command> on those tables to keep the statistics
+ up to date.
+ </para>
+ </tip>
+
+ </sect2>
</sect1>
--
2.40.1
v4-0007-Make-Routine-Vacuuming-autovacuum-orientated.patchapplication/octet-stream; name=v4-0007-Make-Routine-Vacuuming-autovacuum-orientated.patchDownload
From ba27796ecce418105003442e957e8739d3f44c23 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 15:20:13 -0700
Subject: [PATCH v4 7/9] Make "Routine Vacuuming" autovacuum-orientated.
Now that it's no longer in its own sect2, shorten the "Vacuuming basics"
content, and make it more autovacuum-orientated. This gives much less
prominence to VACUUM FULL, which has little place in a section about
autovacuum. We no longer define avoiding the need to run VACUUM FULL as
the purpose of vacuuming.
A later commit that overhauls "Recovering Disk Space" will add back a
passing mention of things like VACUUM FULL and TRUNCATE, but only as
something that might be relevant in extreme cases. (Use of these
commands is hopefully neither "Routine" nor "Basic" to most users).
Also add some introductory information about the audience and goals of
the "Routine Vacuuming" section of the docs.
---
doc/src/sgml/maintenance.sgml | 132 +++++++++++++++++++++-------------
1 file changed, 83 insertions(+), 49 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 3f5b83b14..db8c5724e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -32,11 +32,12 @@
</para>
<para>
- The other main category of maintenance task is periodic <quote>vacuuming</quote>
- of the database. This activity is discussed in
- <xref linkend="routine-vacuuming"/>. Closely related to this is updating
- the statistics that will be used by the query planner, as discussed in
- <xref linkend="vacuum-for-statistics"/>.
+ The other main category of maintenance task is periodic
+ <quote><link linkend="routine-vacuuming">vacuuming</link></quote> of
+ the database by autovacuum. Configuring autovacuum scheduling is
+ discussed in <xref linkend="autovacuum"/>. Autovacuum also updates
+ the statistics that will be used by the query planner, as discussed
+ in <xref linkend="vacuum-for-statistics"/>.
</para>
<para>
@@ -243,7 +244,7 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</sect1>
<sect1 id="routine-vacuuming">
- <title>Routine Vacuuming</title>
+ <title>Autovacuum Maintenance Tasks</title>
<indexterm zone="routine-vacuuming">
<primary>vacuum</primary>
@@ -251,24 +252,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<para>
<productname>PostgreSQL</productname> databases require periodic
- maintenance known as <firstterm>vacuuming</firstterm>. For many installations, it
- is sufficient to let vacuuming be performed by the <firstterm>autovacuum
- daemon</firstterm>, which is described in <xref linkend="autovacuum"/>. You might
- need to adjust the autovacuuming parameters described there to obtain best
- results for your situation. Some database administrators will want to
- supplement or replace the daemon's activities with manually-managed
- <command>VACUUM</command> commands, which typically are executed according to a
- schedule by <application>cron</application> or <application>Task
- Scheduler</application> scripts. To set up manually-managed vacuuming properly,
- it is essential to understand the issues discussed in the next few
- subsections. Administrators who rely on autovacuuming may still wish
- to skim this material to help them understand and adjust autovacuuming.
+ maintenance known as <firstterm>vacuuming</firstterm>, and require
+ periodic updates to the statistics used by the
+ <productname>PostgreSQL</productname> query planner. The <link
+ linkend="sql-vacuum"><command>VACUUM</command></link> and <link
+ linkend="sql-analyze"><command>ANALYZE</command></link> commands
+ perform these maintenance tasks. The <firstterm>autovacuum
+ daemon</firstterm> automatically schedules maintenance tasks based on
+ workload requirements.
</para>
-
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ The autovacuum daemon has to process each table regularly for several
+ reasons:
<orderedlist>
<listitem>
@@ -294,35 +289,74 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</listitem>
</orderedlist>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
+ The first four maintenance tasks are handled by running
+ <command>VACUUM</command> from within an autovacuum worker process. The
+ fifth and final task (maintenance of planner statistics) is handled by
+ running <command>ANALYZE</command> from within an autovacuum worker
+ process.
+ </para>
+ <para>
+ Generally speaking, database administrators new to tuning autovacuum should
+ start by considering the need to adjust autovacuum's scheduling.
+ Autovacuum scheduling is controlled via threshold settings. These settings
+ determine when autovacuum should launch a worker to run
+ <command>VACUUM</command> and/or <command>ANALYZE</command>; see the
+ previous section, <xref linkend="autovacuum"/>. This section provides
+ additional information about the design and goals of autovacuum,
+ <command>VACUUM</command>, and <command>ANALYZE</command>. The intended
+ audience is database administrators that wish to perform more advanced
+ autovacuum tuning, with any of the following goals in mind:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Tuning <command>VACUUM</command> to improve query response times.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Making sure that <command>VACUUM</command>'s management of the
+ transaction ID address space is functioning optimally.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Tuning <command>VACUUM</command> for performance stability.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ With larger installations, tuning autovacuum usually won't be a once-off
+ task; it is best to approach tuning as an iterative, applied process.
+ </para>
+ <para>
+ Autovacuum might create a lot of I/O traffic at times, which can cause poor
+ performance for other active sessions. There are configuration parameters
+ you can adjust to reduce the impact on system response time. See the
+ autovacuum-specific cost delay settings described in
+ <xref linkend="runtime-config-autovacuum"/>, and additional cost delay
+ settings described in <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
+ <para>
+ Database administrators might also find it useful to supplement the
+ daemon's activities with manually-managed <command>VACUUM</command>
+ commands. Scripting tools like <application>cron</application> and
+ <application>Task Manager</application> can help with this. It can be
+ useful to perform off-hours <command>VACUUM</command> commands during
+ periods when the application experiences less demand (e.g., on weekends, or
+ in the middle of the night). This section applies equally to
+ manually-issued <command>VACUUM</command> and <command>ANALYZE</command>
+ operations, except where otherwise noted.
</para>
- <para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
- </para>
+ <tip>
+ <para>
+ You can monitor <command>VACUUM</command> progress (whether run by
+ autovacuum or manually) via the
+ <structname>pg_stat_progress_vacuum</structname> view. See
+ <xref linkend="vacuum-progress-reporting"/>.
+ </para>
+ </tip>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.1
v4-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchapplication/octet-stream; name=v4-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patchDownload
From 8766671e9de5fff2cc7fa39bfc64bf14b12b9674 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 11:44:45 -0700
Subject: [PATCH v4 6/9] Merge "basic vacuuming" sect2 into sect1 introduction.
This doesn't change any of the content itself. It just merges the
original text into the sect1 text that immediately preceded it.
This is preparation for the next commit, which will remove most of the
text "relocated" in this commit. This structure should make things a
little easier for doc translators.
---
doc/src/sgml/maintenance.sgml | 106 ++++++++++++++++------------------
1 file changed, 51 insertions(+), 55 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 970c4a848..3f5b83b14 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -265,68 +265,64 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
to skim this material to help them understand and adjust autovacuuming.
</para>
- <sect2 id="vacuum-basics">
- <title>Vacuuming Basics</title>
+ <para>
+ <productname>PostgreSQL</productname>'s
+ <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
+ process each table on a regular basis for several reasons:
- <para>
- <productname>PostgreSQL</productname>'s
- <link linkend="sql-vacuum"><command>VACUUM</command></link> command has to
- process each table on a regular basis for several reasons:
+ <orderedlist>
+ <listitem>
+ <simpara>To recover or reuse disk space occupied by updated or deleted
+ rows.</simpara>
+ </listitem>
- <orderedlist>
- <listitem>
- <simpara>To recover or reuse disk space occupied by updated or deleted
- rows.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To protect against loss of very old data due to
+ <firstterm>transaction ID wraparound</firstterm> or
+ <firstterm>multixact ID wraparound</firstterm>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To protect against loss of very old data due to
- <firstterm>transaction ID wraparound</firstterm> or
- <firstterm>multixact ID wraparound</firstterm>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update the visibility map, which speeds
+ up <link linkend="indexes-index-only-scans">index-only
+ scans</link>.</simpara>
+ </listitem>
- <listitem>
- <simpara>To update the visibility map, which speeds
- up <link linkend="indexes-index-only-scans">index-only
- scans</link>.</simpara>
- </listitem>
+ <listitem>
+ <simpara>To update data statistics used by the
+ <productname>PostgreSQL</productname> query planner.</simpara>
+ </listitem>
+ </orderedlist>
- <listitem>
- <simpara>To update data statistics used by the
- <productname>PostgreSQL</productname> query planner.</simpara>
- </listitem>
- </orderedlist>
+ Each of these reasons dictates performing <command>VACUUM</command> operations
+ of varying frequency and scope, as explained in the following subsections.
+ </para>
- Each of these reasons dictates performing <command>VACUUM</command> operations
- of varying frequency and scope, as explained in the following subsections.
- </para>
+ <para>
+ There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
+ and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
+ disk space but runs much more slowly. Also,
+ the standard form of <command>VACUUM</command> can run in parallel with production
+ database operations. (Commands such as <command>SELECT</command>,
+ <command>INSERT</command>, <command>UPDATE</command>, and
+ <command>DELETE</command> will continue to function normally, though you
+ will not be able to modify the definition of a table with commands such as
+ <command>ALTER TABLE</command> while it is being vacuumed.)
+ <command>VACUUM FULL</command> requires an
+ <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
+ working on, and therefore cannot be done in parallel with other use
+ of the table. Generally, therefore,
+ administrators should strive to use standard <command>VACUUM</command> and
+ avoid <command>VACUUM FULL</command>.
+ </para>
- <para>
- There are two variants of <command>VACUUM</command>: standard <command>VACUUM</command>
- and <command>VACUUM FULL</command>. <command>VACUUM FULL</command> can reclaim more
- disk space but runs much more slowly. Also,
- the standard form of <command>VACUUM</command> can run in parallel with production
- database operations. (Commands such as <command>SELECT</command>,
- <command>INSERT</command>, <command>UPDATE</command>, and
- <command>DELETE</command> will continue to function normally, though you
- will not be able to modify the definition of a table with commands such as
- <command>ALTER TABLE</command> while it is being vacuumed.)
- <command>VACUUM FULL</command> requires an
- <literal>ACCESS EXCLUSIVE</literal> lock on the table it is
- working on, and therefore cannot be done in parallel with other use
- of the table. Generally, therefore,
- administrators should strive to use standard <command>VACUUM</command> and
- avoid <command>VACUUM FULL</command>.
- </para>
-
- <para>
- <command>VACUUM</command> creates a substantial amount of I/O
- traffic, which can cause poor performance for other active sessions.
- There are configuration parameters that can be adjusted to reduce the
- performance impact of background vacuuming — see
- <xref linkend="runtime-config-resource-vacuum-cost"/>.
- </para>
- </sect2>
+ <para>
+ <command>VACUUM</command> creates a substantial amount of I/O
+ traffic, which can cause poor performance for other active sessions.
+ There are configuration parameters that can be adjusted to reduce the
+ performance impact of background vacuuming — see
+ <xref linkend="runtime-config-resource-vacuum-cost"/>.
+ </para>
<sect2 id="vacuum-for-space-recovery">
<title>Recovering Disk Space</title>
--
2.40.1
v4-0008-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchapplication/octet-stream; name=v4-0008-Overhaul-Recovering-Disk-Space-vacuuming-docs.patchDownload
From 064cfa0b489a2c76dd8b527e119ca5c4658de295 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:33:42 -0700
Subject: [PATCH v4 8/9] Overhaul "Recovering Disk Space" vacuuming docs.
XXX This commit is much less worked out and polished than the work on
freezing. It should very much be considered a work in progress, and
isn't the priority for now.
Say a lot more about the possible impact of long-running transactions on
VACUUM. Remove all talk of administrators getting by without
autovacuum; at most administrators might want to schedule manual VACUUM
operations to supplement autovacuum (this documentation was written at a
time when the visibility map didn't exist, even in its most basic form).
Also describe VACUUM FULL as an entirely different kind of operation to
conventional lazy vacuum.
XXX Open question for this commit:
I wonder if it would make sense to move all of that stuff into its own
new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along
the lines of "what to do about bloat when all else fails, when the
problem gets completely out of hand". Naturally we'd link to this new
section from "Routine Vacuuming".
XXX For now, a lot of the information about CLUSTER and VACUUM FULL is
moved into Note/Warning boxes. This arrangement is definitely going to
be temporary.
---
doc/src/sgml/maintenance.sgml | 165 ++++++++++++++++++----------------
1 file changed, 87 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index db8c5724e..f00442564 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -372,100 +372,109 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
This approach is necessary to gain the benefits of multiversion
concurrency control (<acronym>MVCC</acronym>, see <xref linkend="mvcc"/>): the row version
must not be deleted while it is still potentially visible to other
- transactions. But eventually, an outdated or deleted row version is no
- longer of interest to any transaction. The space it occupies must then be
- reclaimed for reuse by new rows, to avoid unbounded growth of disk
- space requirements. This is done by running <command>VACUUM</command>.
+ transactions. A deleted row version (whether from an
+ <command>UPDATE</command> or <command>DELETE</command>) will usually cease
+ to be of interest to any still-running transaction shortly after the
+ original deleting transaction commits.
</para>
<para>
- The standard form of <command>VACUUM</command> removes dead row
- versions in tables and indexes and marks the space available for
- future reuse. However, it will not return the space to the operating
- system, except in the special case where one or more pages at the
- end of a table become entirely free and an exclusive table lock can be
- easily obtained. In contrast, <command>VACUUM FULL</command> actively compacts
- tables by writing a complete new version of the table file with no dead
- space. This minimizes the size of the table, but can take a long time.
- It also requires extra disk space for the new copy of the table, until
- the operation completes.
+ The space dead tuples occupy must eventually be reclaimed for reuse by new
+ rows, to avoid unbounded growth of disk space requirements. Reclaiming
+ space from dead rows is <command>VACUUM</command>'s main responsibility.
</para>
<para>
- The usual goal of routine vacuuming is to do standard <command>VACUUM</command>s
- often enough to avoid needing <command>VACUUM FULL</command>. The
- autovacuum daemon attempts to work this way, and in fact will
- never issue <command>VACUUM FULL</command>. In this approach, the idea
- is not to keep tables at their minimum size, but to maintain steady-state
- usage of disk space: each table occupies space equivalent to its
- minimum size plus however much space gets used up between vacuum runs.
- Although <command>VACUUM FULL</command> can be used to shrink a table back
- to its minimum size and return the disk space to the operating system,
- there is not much point in this if the table will just grow again in the
- future. Thus, moderately-frequent standard <command>VACUUM</command> runs are a
- better approach than infrequent <command>VACUUM FULL</command> runs for
- maintaining heavily-updated tables.
+ The <glossterm linkend="glossary-xid">transaction ID number
+ (<acronym>XID</acronym>)</glossterm> based cutoff point that
+ <command>VACUUM</command> uses to determine if a deleted tuple is safe to
+ physically remove is reported under <literal>removable cutoff</literal> in
+ the server log when autovacuum logging (controlled by <xref
+ linkend="guc-log-autovacuum-min-duration"/>) reports on a
+ <command>VACUUM</command> operation executed by autovacuum. Tuples that
+ are not yet safe to remove are counted as <literal>dead but not yet
+ removable</literal> tuples in the log report. <command>VACUUM</command>
+ establishes its <literal>removable cutoff</literal> once, at the start of
+ the operation. Any older <acronym>MVCC</acronym> snapshot (or transaction
+ that allocates an XID) that's still running when the cutoff is established
+ may hold it back.
</para>
- <para>
- Some administrators prefer to schedule vacuuming themselves, for example
- doing all the work at night when load is low.
- The difficulty with doing vacuuming according to a fixed schedule
- is that if a table has an unexpected spike in update activity, it may
- get bloated to the point that <command>VACUUM FULL</command> is really necessary
- to reclaim space. Using the autovacuum daemon alleviates this problem,
- since the daemon schedules vacuuming dynamically in response to update
- activity. It is unwise to disable the daemon completely unless you
- have an extremely predictable workload. One possible compromise is
- to set the daemon's parameters so that it will only react to unusually
- heavy update activity, thus keeping things from getting out of hand,
- while scheduled <command>VACUUM</command>s are expected to do the bulk of the
- work when the load is typical.
- </para>
+ <caution>
+ <para>
+ It's critical that no long-running transactions are allowed to hold back
+ every <command>VACUUM</command> operation's cutoff for an extended
+ period. It may be a good idea to add monitoring to alert you about this.
+ </para>
+ </caution>
+
+ <note>
+ <para>
+ <command>VACUUM</command> can remove tuples inserted by aborted
+ transactions immediately
+ </para>
+ </note>
<para>
- For those not using autovacuum, a typical approach is to schedule a
- database-wide <command>VACUUM</command> once a day during a low-usage period,
- supplemented by more frequent vacuuming of heavily-updated tables as
- necessary. (Some installations with extremely high update rates vacuum
- their busiest tables as often as once every few minutes.) If you have
- multiple databases in a cluster, don't forget to
- <command>VACUUM</command> each one; the program <xref
- linkend="app-vacuumdb"/> might be helpful.
+ <command>VACUUM</command> usually doesn't return space to the operating
+ system. There is one exception: space is returned to the OS whenever a
+ group of contiguous pages appears at the end of a table.
+ <command>VACUUM</command> must acquire an <literal>ACCESS
+ EXCLUSIVE</literal> lock to perform relation truncation. You can disable
+ relation truncation by setting the table's
+ <varname>vacuum_truncate</varname> storage parameter to
+ <literal>off</literal>.
</para>
<tip>
- <para>
- Plain <command>VACUUM</command> may not be satisfactory when
- a table contains large numbers of dead row versions as a result of
- massive update or delete activity. If you have such a table and
- you need to reclaim the excess disk space it occupies, you will need
- to use <command>VACUUM FULL</command>, or alternatively
- <link linkend="sql-cluster"><command>CLUSTER</command></link>
- or one of the table-rewriting variants of
- <link linkend="sql-altertable"><command>ALTER TABLE</command></link>.
- These commands rewrite an entire new copy of the table and build
- new indexes for it. All these options require an
- <literal>ACCESS EXCLUSIVE</literal> lock. Note that
- they also temporarily use extra disk space approximately equal to the size
- of the table, since the old copies of the table and indexes can't be
- released until the new ones are complete.
- </para>
+ <para>
+ If you have a table whose entire contents are deleted periodically,
+ consider using <command>TRUNCATE</command> rather than
+ <command>DELETE</command>. <command>TRUNCATE</command> removes the entire
+ table's contents immediately, obviating the need for
+ <command>VACUUM</command>. One disadvantage is that strict
+ <acronym>MVCC</acronym> semantics are violated.
+ </para>
</tip>
-
<tip>
- <para>
- If you have a table whose entire contents are deleted on a periodic
- basis, consider doing it with
- <link linkend="sql-truncate"><command>TRUNCATE</command></link> rather
- than using <command>DELETE</command> followed by
- <command>VACUUM</command>. <command>TRUNCATE</command> removes the
- entire content of the table immediately, without requiring a
- subsequent <command>VACUUM</command> or <command>VACUUM
- FULL</command> to reclaim the now-unused disk space.
- The disadvantage is that strict MVCC semantics are violated.
- </para>
+ <para>
+ <command>VACUUM FULL</command> (or <command>CLUSTER</command>) can be
+ useful when dealing with extreme amounts of dead tuples. It can reclaim
+ more disk space, but it is much slower, and usually more disruptive.
+ <command>VACUUM FULL</command> rewrites an entire new copy of the table
+ and rebuilds all of the table's indexes. This makes it suitable for
+ highly fragmented tables, and tables where significant amounts of space
+ can be reclaimed.
+ </para>
</tip>
+ <note>
+ <para>
+ Although <command>VACUUM FULL</command> is technically an option of the
+ <command>VACUUM</command> command, <command>VACUUM FULL</command> uses a
+ completely different implementation. <command>VACUUM FULL</command> is
+ essentially a variant of <command>CLUSTER</command>. (The name
+ <command>VACUUM FULL</command> is historical; the original implementation
+ was closer to standard <command>VACUUM</command>.)
+ </para>
+ </note>
+ <warning>
+ <para>
+ <command>TRUNCATE</command>, <command>VACUUM FULL</command>, and
+ <command>CLUSTER</command> all require an <literal>ACCESS
+ EXCLUSIVE</literal> lock, which can be highly disruptive
+ (<command>SELECT</command>, <command>INSERT</command>,
+ <command>UPDATE</command>, and <command>DELETE</command> commands can't
+ run at the same time).
+ </para>
+ </warning>
+ <warning>
+ <para>
+ <command>VACUUM FULL</command> and <command>CLUSTER</command> temporarily
+ use extra disk space. The extra space required is approximately equal to
+ the size of the table, since the old copies of the table and indexes
+ can't be released until the new ones are complete.
+ </para>
+ </warning>
</sect2>
<sect2 id="vacuum-for-wraparound">
--
2.40.1
v4-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchapplication/octet-stream; name=v4-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patchDownload
From fbd1260730a7e9a22033b24db2a1a8e7c4d58ef7 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sat, 22 Apr 2023 12:41:00 -0700
Subject: [PATCH v4 5/9] Move Interpreting XID stamps from tuple headers.
Move handling of 32-bit XID comparisons/physical wraparound from
"Routine Vacuuming" to chapter about transaction internals.
This is intended to be fairly close to a mechanical change. It isn't
entirely mechanical, though, since the original wording has been
slightly modified for it to work in context.
TODO fix xact.sgml indentation. The new content is indented correctly
already, but the existing content will need to be re-indented to match
in a later commit. As always, structuring things this way is intended
to make life a little bit easier for doc translators.
---
doc/src/sgml/maintenance.sgml | 80 +++++++---------------------------
doc/src/sgml/xact.sgml | 82 ++++++++++++++++++++++++++++++++++-
2 files changed, 96 insertions(+), 66 deletions(-)
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 83fa7ba8b..970c4a848 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -446,75 +446,25 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
<secondary>wraparound</secondary>
</indexterm>
- <indexterm>
- <primary>wraparound</primary>
- <secondary>of transaction IDs</secondary>
- </indexterm>
+ <indexterm>
+ <primary>wraparound</primary>
+ <secondary>of transaction IDs</secondary>
+ </indexterm>
<para>
- <productname>PostgreSQL</productname>'s
- <link linkend="mvcc-intro">MVCC</link> transaction semantics
- depend on being able to compare transaction ID (<acronym>XID</acronym>)
- numbers: a row version with an insertion XID greater than the current
- transaction's XID is <quote>in the future</quote> and should not be visible
- to the current transaction. But since transaction IDs have limited size
- (32 bits) a cluster that runs for a long time (more
- than 4 billion transactions) would suffer <firstterm>transaction ID
- wraparound</firstterm>: the XID counter wraps around to zero, and all of a sudden
- transactions that were in the past appear to be in the future — which
- means their output become invisible. In short, catastrophic data loss.
- (Actually the data is still there, but that's cold comfort if you cannot
- get at it.) To avoid this, it is necessary to vacuum every table
- in every database at least once every two billion transactions.
+ <productname>PostgreSQL</productname>'s <link
+ linkend="mvcc-intro">MVCC</link> transaction semantics depend on
+ being able to compare <glossterm linkend="glossary-xid">transaction
+ ID numbers (<acronym>XID</acronym>)</glossterm> to determine
+ whether or not the row is visible to each query's MVCC snapshot
+ (see <xref linkend="interpreting-xid-stamps"/>). But since
+ on-disk storage of transaction IDs in heap pages uses a truncated
+ 32-bit representation to save space (rather than the full 64-bit
+ representation), it is necessary to vacuum every table in every
+ database <emphasis>at least</emphasis> once every two billion
+ transactions (though far more frequent vacuuming is typical).
</para>
- <para>
- The reason that periodic vacuuming solves the problem is that
- <command>VACUUM</command> will mark rows as <emphasis>frozen</emphasis>, indicating that
- they were inserted by a transaction that committed sufficiently far in
- the past that the effects of the inserting transaction are certain to be
- visible to all current and future transactions.
- Normal XIDs are
- compared using modulo-2<superscript>32</superscript> arithmetic. This means
- that for every normal XID, there are two billion XIDs that are
- <quote>older</quote> and two billion that are <quote>newer</quote>; another
- way to say it is that the normal XID space is circular with no
- endpoint. Therefore, once a row version has been created with a particular
- normal XID, the row version will appear to be <quote>in the past</quote> for
- the next two billion transactions, no matter which normal XID we are
- talking about. If the row version still exists after more than two billion
- transactions, it will suddenly appear to be in the future. To
- prevent this, <productname>PostgreSQL</productname> reserves a special XID,
- <literal>FrozenTransactionId</literal>, which does not follow the normal XID
- comparison rules and is always considered older
- than every normal XID.
- Frozen row versions are treated as if the inserting XID were
- <literal>FrozenTransactionId</literal>, so that they will appear to be
- <quote>in the past</quote> to all normal transactions regardless of wraparound
- issues, and so such row versions will be valid until deleted, no matter
- how long that is.
- </para>
-
- <note>
- <para>
- In <productname>PostgreSQL</productname> versions before 9.4, freezing was
- implemented by actually replacing a row's insertion XID
- with <literal>FrozenTransactionId</literal>, which was visible in the
- row's <structname>xmin</structname> system column. Newer versions just set a flag
- bit, preserving the row's original <structname>xmin</structname> for possible
- forensic use. However, rows with <structname>xmin</structname> equal
- to <literal>FrozenTransactionId</literal> (2) may still be found
- in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
- </para>
- <para>
- Also, system catalogs may contain rows with <structname>xmin</structname> equal
- to <literal>BootstrapTransactionId</literal> (1), indicating that they were
- inserted during the first phase of <application>initdb</application>.
- Like <literal>FrozenTransactionId</literal>, this special XID is treated as
- older than every normal XID.
- </para>
- </note>
-
<para>
<xref linkend="guc-vacuum-freeze-min-age"/>
controls how old an XID value has to be before rows bearing that XID will be
diff --git a/doc/src/sgml/xact.sgml b/doc/src/sgml/xact.sgml
index b467660ee..8a1f9fd6f 100644
--- a/doc/src/sgml/xact.sgml
+++ b/doc/src/sgml/xact.sgml
@@ -22,6 +22,8 @@
single-statement transactions.
</para>
+ <sect2 id="virtual-xids">
+ <title>Virtual Transaction IDs</title>
<para>
Every transaction is identified by a unique
<literal>VirtualTransactionId</literal> (also called
@@ -46,10 +48,13 @@
started, particularly if the transaction started with statements that
only performed database reads.
</para>
+ </sect2>
+ <sect2 id="permanent-xids">
+ <title>Permanent Transaction IDs</title>
<para>
The internal transaction ID type <type>xid</type> is 32 bits wide
- and <link linkend="vacuum-for-wraparound">wraps around</link> every
+ and wraps around every
4 billion transactions. A 32-bit epoch is incremented during each
wraparound. There is also a 64-bit type <type>xid8</type> which
includes this epoch and therefore does not wrap around during the
@@ -69,6 +74,80 @@
linkend="guc-track-commit-timestamp"/> is enabled.
</para>
+ <sect3 id="interpreting-xid-stamps">
+ <title><type>TransactionId</type> comparison rules</title>
+ <para>
+ The system often needs to compare <structfield>t_xmin</structfield>
+ and <structfield>t_xmax</structfield> fields for MVCC snapshot
+ visibility checks.
+ </para>
+
+ <para>
+ We use a truncated 32-bit representation of transaction IDs, rather than
+ using the full 64-bit representation. The 2.1 billion XIDs
+ <quote>distance</quote> invariant must be preserved because transaction
+ IDs stored in heap row headers use a truncated 32-bit representation
+ (rather than the full 64-bit representation). Since all unfrozen
+ transaction IDs from heap tuple headers <emphasis>must</emphasis> be from
+ the same transaction ID epoch (or from a space in the 64-bit
+ representation that spans two adjoining transaction ID epochs), there
+ isn't any need to store a separate epoch field in each tuple header.
+ This scheme has the advantage of requiring much less space than a design
+ that stores an XID epoch alongside each XID stored in each heap tuple
+ header. It has the disadvantage of constraining the system's ability to
+ allocate new XIDs (in the worst case scenario where transaction ID
+ exhaustion occurs).
+ </para>
+
+ <para>
+ <command>VACUUM</command> will mark tuple headers
+ <emphasis>frozen</emphasis>, indicating that all eligible rows on the
+ page were inserted by a transaction that committed sufficiently far in
+ the past that the effects of the inserting transaction are certain to be
+ visible to all current and future transactions. Normal XIDs are compared
+ using modulo-2<superscript>32</superscript> arithmetic. This means that
+ for every normal XID, there are two billion XIDs that are
+ <quote>older</quote> and two billion that are <quote>newer</quote>;
+ another way to say it is that the normal XID space is circular with no
+ endpoint. Therefore, once a row version has been created with a
+ particular normal XID, the row version will appear to be <quote>in the
+ past</quote> for the next two billion transactions, no matter which
+ normal XID we are talking about. If the row version still exists after
+ more than two billion transactions, it will suddenly appear to be in the
+ future. To prevent this, <productname>PostgreSQL</productname> reserves a
+ special XID, <literal>FrozenTransactionId</literal>, which does not
+ follow the normal XID comparison rules and is always considered older
+ than every normal XID. Frozen row versions are treated as if the
+ inserting XID were <literal>FrozenTransactionId</literal>, so that they
+ will appear to be <quote>in the past</quote> to all normal transactions
+ regardless of wraparound issues, and so such row versions will be valid
+ until deleted, no matter how long that is.
+ </para>
+
+ <note>
+ <para>
+ In <productname>PostgreSQL</productname> versions before 9.4, freezing was
+ implemented by actually replacing a row's insertion XID
+ with <literal>FrozenTransactionId</literal>, which was visible in the
+ row's <structname>xmin</structname> system column. Newer versions just set a flag
+ bit, preserving the row's original <structname>xmin</structname> for possible
+ forensic use. However, rows with <structname>xmin</structname> equal
+ to <literal>FrozenTransactionId</literal> (2) may still be found
+ in databases <application>pg_upgrade</application>'d from pre-9.4 versions.
+ </para>
+ <para>
+ Also, system catalogs may contain rows with <structname>xmin</structname> equal
+ to <literal>BootstrapTransactionId</literal> (1), indicating that they were
+ inserted during the first phase of <application>initdb</application>.
+ Like <literal>FrozenTransactionId</literal>, this special XID is treated as
+ older than every normal XID.
+ </para>
+ </note>
+ </sect3>
+ </sect2>
+
+ <sect2 id="global-transaction-ids">
+ <title>Global Transaction Identifiers</title>
<para>
In addition to <literal>vxid</literal> and <type>xid</type>,
prepared transactions are also assigned Global Transaction
@@ -77,6 +156,7 @@
prepared transactions. The mapping of GID to xid is shown in <link
linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.
</para>
+ </sect2>
</sect1>
<sect1 id="xact-locking">
--
2.40.1
Thanks for the continued work, Peter. I hate to be the guy that starts this way,
but this is my first ever response on pgsql-hackers. (insert awkward
smile face).
Hopefully I've followed etiquette well, but please forgive any
missteps, and I'm
happy for any help in making better contributions in the future.
On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatable XIDs"?
I use the term "transaction ID exhaustion" in the attached revision,
v4. Overall, v4 builds on the work that went into v2 and v3, by
continuing to polish the overhaul of everything related to freezing,
relfrozenxid advancement, and anti-wraparound autovacuum.
Just to say on the outset, as has been said earlier in the tread by others,
that this is herculean work. Thank you for putting the effort you have thus far.
There's a lot of good from where I sit in the modification efforts.
It's a heavy,
dense topic, so there's probably never going to be a perfect way to
get it all in,
but some of the context early on, especially, is helpful for framing.
It would be nice if it was possible to add an animation/diagram a
little like this one: https://tuple-freezing-demo.angusd.com (this is
how I tend to think about the "transaction ID space".)
Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
this or something akin to the "clock" image I've seen elsewhere when
describing the transaction ID space would probably be helpful if it were ever
possible. In fact, there's just a lot about the MVCC stuff in general that
would benefit from diagrams. But alas, I guess that's why we have some
good go-to community talks/slide decks. :-)
v4 also limits use of the term "wraparound" to places that directly
discuss anti-wraparound autovacuums (plus one place in xact.sgml,
where discussion of "true unsigned integer wraparound" and related
implementation details has been moved). Otherwise we use the term
"transaction ID exhaustion", which is pretty much the user-facing name
for "xidStopLimit". I feel that this is a huge improvement, for the
reason given to Greg earlier. I'm flexible on the details, but I feel
strongly that we should minimize use of the term wraparound wherever
it might have the connotation of "the past becoming the future". This
is not a case of inventing a new terminology for its own sake. If
anybody is skeptical I ask that they take a look at what I came up
with before declaring it a bad idea. I have made that as easy as
possible, by once again attaching a prebuilt routine-vacuuming.html.
Thanks again for doing this. Really helpful for doc newbies like me that
want to help but are still working through the process. Really helpful
and appreciated.
Other changes in v4, compared to v3:
* Improved discussion of the differences between non-aggressive and
aggressive VACUUM.
This was helpful for me and not something I've previously put much thought
into. Helpful context that is missing from the current docs.
* Explains "catch-up freezing" performed by aggressive VACUUMs directly.
"Catch-up" freezing is the really important "consequence" -- something
that emerges from how each type of VACUUM behaves over time. It is an
indirect consequence of the behaviors. I would like to counter the
perception that some users have about freezing only happening during
aggressive VACUUMs (or anti-wraparound autovacuums). But more than
that, talking about catch-up freezing seems essential because it is
the single most important difference.
Similarly, this was helpful overall context of various things
happening with freezing.
* Much improved handling of the discussion of anti-wraparound
autovacuum, and how it relates to aggressive VACUUMs, following
feedback from Samay.There is now only fairly minimal overlap in the discussion of
aggressive VACUUM and anti-wraparound autovacuuming. We finish the
discussion of aggressive VACUUM just after we start discussing
anti-wraparound autovacuum. This transition works well, because it
enforces the idea that anti-wraparound autovacuum isn't really special
compared to any other aggressive autovacuum. This was something that
Samay expressed particularly concern about: making anti-wraparound
autovacuums sound less scary. Though it's also a concern I had from
the outset, based on practical experience and interactions with people
that have much less knowledge of Postgres than I do.
Agree. This flows fairly well and helps the user understand that each
"next step"
in the vacuum/freezing process has a distinct job based on previous work.
* Anti-wraparound autovacuum is now mostly discussed as something that
happens to static or mostly-static tables....
...This moves discussion of anti-wraparound av in the direction of:
"Anti-wraparound autovacuum is a special type of autovacuum. Its
purpose is to ensure that relfrozenxid advances when no earlier VACUUM
could advance it in passing — often because no VACUUM has run against
the table for an extended period."
Again, learned something new here, at least in how I think about it and talk
with others. In total, I do think these changes make wraparound/exhaustion
seem less "the sky is falling".
* Added a couple of "Tips" about instrumentation that appears in the
server log whenever autovacuum reports on a VACUUM operation.* Much improved "Truncating Transaction Status Information" subsection.
My explanation of the ways in which autovacuum_freeze_max_age can
affect the storage overhead of commit/abort status in pg_xact is much
clearer than it was in v3 -- pg_xact truncation is now treated as
something loosely related to the global config of anti-wraparound
autovacuum, which makes most sense.
This one isn't totally sinking in with me yet. Need another read.
It took a great deal of effort to find a structure that covered
everything, and that highlighted all of the important relationships
without going too far, while at the same time not being a huge mess.
That's what I feel I've arrived at with v4.
In most respects I agree with the overall flow of changes w.r.t the current doc.
Focusing on all of this as something that should normally just be happening
as part of autovacuum is helpful. Working through it as an order of operations
(and I'm just assuming this is the general order) feels like it ties
things together
a lot more. I honestly come away from this document with more of a "I understand
the process" feel than I did previously.
For now, I'd add the following few comments on the intro section,
2.5.1 and 2.5.2. I
haven't gotten to the bottom sections yet for much feedback.
Intro Comments:
1) "The autovacuum daemon automatically schedules maintenance tasks based on
workload requirements." feels at tension with "Autovacuum scheduling
is controlled
via threshold settings."
Owing to the lingering belief that many users have whereby hosting providers
have magically enabled Postgres to do all of this for you, there is
still a need to
actively deal with these thresholds based on load. That is, as far as
I understand,
Postgres doesn't automatically adjust based on load. Someone/thing
still has to modify
the thresholds as load and data size changes.
If the "workload requirements" is pointing towards aggressive
freezing/wraparound
tasks that happen regardless of thresholds, then for me at least that
isn't clear
in that sentence and it feels like there's an implication that
Postgres/autovacuum
is going to magically adjust overall vacuum work based on database workload.
2) "The intended audience is database administrators that wish to
perform more advanced
autovacuum tuning, with any of the following goals in mind:"
I love calling out the audience in some way. That's really helpful, as are the
stated goals in the bullet list. However, as someone feeling pretty novice
after reading all of this, I can't honestly connect how the content on this page
helps me to more advanced tuning. I have a much better idea how freezing,
in particular, works (yay!), but I'm feeling a bit dense how almost anything
here helps me tune vacuum, at least as it relates to the bullets.
I'm sure you have a connection in mind for each, and certainly understanding the
inner workings of what's happening under the covers is tremendously beneficial,
but when I search for "response" or "performance" in this document, it refers
back to another page (not included in this patch) that talks about the
thresholds.
It might be as simple as adding something to the end of each bullet to draw
that relationship, but as is, it's hard for me to do it mentally (although I can
conjecture a few things on my own)
That said, I definitely appreciate the callout that tuning is an
iterative process
and the minor switch from "creates a substantial amount of I/O
traffic" to "may create...".
** Section 2.5.1 - Recovering Disk Space **
3). "The space dead tuples occupy must eventually be reclaimed for reuse
by new rows, to avoid unbounded growth of disk space requirements. Reclaiming
space from dead rows is VACUUM's main responsibility."
It feels like one connection you could make to the bullet list above
is in this area
and not mentioned. By freeing up space and reducing the number of pages that
need to be read for satisfying a query, vacuum and recovering disk space
(theoretically) improves query performance. Not 100% how to add it in context
of these first two paragraphs.
4) Caution: "It may be a good idea to add monitoring to alert you about this."
I hate to be pedantic about it, but I think we should spell out
"this". Do we have
a pointer in documentation to what kinds of things to monitor for? Am monitoring
long-running transactions or some metric that shows me that VACUUM is being
"held back"? I know what you mean, but it's not clear to me how to do the right
thing in my environment here.
5) The plethora of tips/notes/warnings.
As you and others have mentioned, as presented these really have no context
for me. Individually they are good/helpful information, but it's
really hard to make
a connection to what I should "do" about it.
It seems to me that this would be a good place to put a subsection which is
something like, "A note about reclaiming disk space" or something. In my
experience, most people hear about and end up using VACUUM FULL because
things got out of control and they want to get into a better spot (I have been
in that boat). I think with a small section that says, in essence,
"hey, now that
you understand why/how vacuum reclaims disk resources normally, if you're
in a position where things aren't in a good state, this is what you need to know
if you want to reclaim space from a really inefficient table"
For me, at least, I think it would be easier to read/grok what you're sharing in
these callouts.
6) One last missing piece that very well might be in another page not referenced
(I obviously need to get the PG16 docs pulled and built locally so
that I can have
better overall reference. My apologies).
In my experience, one of the biggest issues with the thresholds and recovering
space is the idea of tuning individual tables, not just the entire
database. 5/10/20%
might be fine for most tables, but it's usually the really active ones
that need the
tuning, specifically lowering the thresholds. That doesn't come across to me in
this section at all. Again, maybe I've missed something on another page and
it's all good, but it felt worth calling out.
Plus, it may provide an opportunity to bring in the threshold formulas again if
they aren't referenced elsewhere (although they probably are).
Hope that makes sense.
** Section 2.5.2: Freezing to manage... **
As stated above, the effort here overall is great IMO. I like the flow
and reduction
in alarmist tone for things like wraparound, etc. I understand more
about freezing,
aggressive and otherwise, than I did before.
7) That said, totally speaking as a non-contributor, this section is
obviously very long
for good reason. But, by the time I've gotten down to 25.2.2.3, my
brain is a bit
bewildered on where we've gotten to. That's more a comment on my capability
to process it all, but I wonder if a slightly more explicit intro
could help set the
stage at least.
"One side-effect of vacuum and transaction ID management at the row level is
that PostgreSQL would normally need to inspect each row for every query to
ensure it is visible to each requesting transaction. In order to
reduce the need to
read and inspect excessive amounts of data at query time or when normal vacuum
maintenance kicks in, VACUUM has a second job called freezing, which
accomplishes three goals: (attempting to tie in the three sections)
* speeding up queries and vacuum operations by...
* advancing the transaction ID space on generally static tables...
* ensure there are always free transaction IDs available for normal
operation...
"
Maybe totally worthless and too much, but something like that might set a reader
up for just a bit more context. Then you could take most of what comes before
"2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
with something like "2.5.2.2.1 Normal Freezing Activity"
8) Note "In PostgreSQL versions before 16..."
Showing my naivety, somehow this isn't connecting with me totally. If
it's important
to call out, then maybe we need a connecting sentence. Based on the content
above, I think you're pointing to "It's also why VACUUM will freeze all eligible
tuples from a heap page once the decision to freeze at least one tuple
is taken:"
If that's it, it's just not clear to me what's totally changed. Sorry,
more learning. :-)
---
Hope something in there is helpful.
Ryan Booz
Show quoted text
--
Peter Geoghegan
And, of course, I forgot that I switch to text-mode after writing most
of this, so the carriage returns were unnecessary. (facepalm... sigh)
--
Ryan
Show quoted text
On Fri, May 12, 2023 at 1:36 PM Ryan Booz <ryan@softwareandbooz.com> wrote:
Thanks for the continued work, Peter. I hate to be the guy that starts this way,
but this is my first ever response on pgsql-hackers. (insert awkward
smile face).
Hopefully I've followed etiquette well, but please forgive any
missteps, and I'm
happy for any help in making better contributions in the future.On Thu, May 11, 2023 at 9:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, May 4, 2023 at 3:18 PM samay sharma <smilingsamay@gmail.com> wrote:
What do you think about the term "Exhaustion"? Maybe something like "XID allocation exhaustion" or "Exhaustion of allocatable XIDs"?
I use the term "transaction ID exhaustion" in the attached revision,
v4. Overall, v4 builds on the work that went into v2 and v3, by
continuing to polish the overhaul of everything related to freezing,
relfrozenxid advancement, and anti-wraparound autovacuum.Just to say on the outset, as has been said earlier in the tread by others,
that this is herculean work. Thank you for putting the effort you have thus far.
There's a lot of good from where I sit in the modification efforts.
It's a heavy,
dense topic, so there's probably never going to be a perfect way to
get it all in,
but some of the context early on, especially, is helpful for framing.It would be nice if it was possible to add an animation/diagram a
little like this one: https://tuple-freezing-demo.angusd.com (this is
how I tend to think about the "transaction ID space".)Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
this or something akin to the "clock" image I've seen elsewhere when
describing the transaction ID space would probably be helpful if it were ever
possible. In fact, there's just a lot about the MVCC stuff in general that
would benefit from diagrams. But alas, I guess that's why we have some
good go-to community talks/slide decks. :-)v4 also limits use of the term "wraparound" to places that directly
discuss anti-wraparound autovacuums (plus one place in xact.sgml,
where discussion of "true unsigned integer wraparound" and related
implementation details has been moved). Otherwise we use the term
"transaction ID exhaustion", which is pretty much the user-facing name
for "xidStopLimit". I feel that this is a huge improvement, for the
reason given to Greg earlier. I'm flexible on the details, but I feel
strongly that we should minimize use of the term wraparound wherever
it might have the connotation of "the past becoming the future". This
is not a case of inventing a new terminology for its own sake. If
anybody is skeptical I ask that they take a look at what I came up
with before declaring it a bad idea. I have made that as easy as
possible, by once again attaching a prebuilt routine-vacuuming.html.Thanks again for doing this. Really helpful for doc newbies like me that
want to help but are still working through the process. Really helpful
and appreciated.Other changes in v4, compared to v3:
* Improved discussion of the differences between non-aggressive and
aggressive VACUUM.This was helpful for me and not something I've previously put much thought
into. Helpful context that is missing from the current docs.* Explains "catch-up freezing" performed by aggressive VACUUMs directly.
"Catch-up" freezing is the really important "consequence" -- something
that emerges from how each type of VACUUM behaves over time. It is an
indirect consequence of the behaviors. I would like to counter the
perception that some users have about freezing only happening during
aggressive VACUUMs (or anti-wraparound autovacuums). But more than
that, talking about catch-up freezing seems essential because it is
the single most important difference.Similarly, this was helpful overall context of various things
happening with freezing.* Much improved handling of the discussion of anti-wraparound
autovacuum, and how it relates to aggressive VACUUMs, following
feedback from Samay.There is now only fairly minimal overlap in the discussion of
aggressive VACUUM and anti-wraparound autovacuuming. We finish the
discussion of aggressive VACUUM just after we start discussing
anti-wraparound autovacuum. This transition works well, because it
enforces the idea that anti-wraparound autovacuum isn't really special
compared to any other aggressive autovacuum. This was something that
Samay expressed particularly concern about: making anti-wraparound
autovacuums sound less scary. Though it's also a concern I had from
the outset, based on practical experience and interactions with people
that have much less knowledge of Postgres than I do.Agree. This flows fairly well and helps the user understand that each
"next step"
in the vacuum/freezing process has a distinct job based on previous work.* Anti-wraparound autovacuum is now mostly discussed as something that
happens to static or mostly-static tables....
...This moves discussion of anti-wraparound av in the direction of:
"Anti-wraparound autovacuum is a special type of autovacuum. Its
purpose is to ensure that relfrozenxid advances when no earlier VACUUM
could advance it in passing — often because no VACUUM has run against
the table for an extended period."Again, learned something new here, at least in how I think about it and talk
with others. In total, I do think these changes make wraparound/exhaustion
seem less "the sky is falling".* Added a couple of "Tips" about instrumentation that appears in the
server log whenever autovacuum reports on a VACUUM operation.* Much improved "Truncating Transaction Status Information" subsection.
My explanation of the ways in which autovacuum_freeze_max_age can
affect the storage overhead of commit/abort status in pg_xact is much
clearer than it was in v3 -- pg_xact truncation is now treated as
something loosely related to the global config of anti-wraparound
autovacuum, which makes most sense.This one isn't totally sinking in with me yet. Need another read.
It took a great deal of effort to find a structure that covered
everything, and that highlighted all of the important relationships
without going too far, while at the same time not being a huge mess.
That's what I feel I've arrived at with v4.In most respects I agree with the overall flow of changes w.r.t the current doc.
Focusing on all of this as something that should normally just be happening
as part of autovacuum is helpful. Working through it as an order of operations
(and I'm just assuming this is the general order) feels like it ties
things together
a lot more. I honestly come away from this document with more of a "I understand
the process" feel than I did previously.For now, I'd add the following few comments on the intro section,
2.5.1 and 2.5.2. I
haven't gotten to the bottom sections yet for much feedback.Intro Comments:
1) "The autovacuum daemon automatically schedules maintenance tasks based on
workload requirements." feels at tension with "Autovacuum scheduling
is controlled
via threshold settings."Owing to the lingering belief that many users have whereby hosting providers
have magically enabled Postgres to do all of this for you, there is
still a need to
actively deal with these thresholds based on load. That is, as far as
I understand,
Postgres doesn't automatically adjust based on load. Someone/thing
still has to modify
the thresholds as load and data size changes.If the "workload requirements" is pointing towards aggressive
freezing/wraparound
tasks that happen regardless of thresholds, then for me at least that
isn't clear
in that sentence and it feels like there's an implication that
Postgres/autovacuum
is going to magically adjust overall vacuum work based on database workload.2) "The intended audience is database administrators that wish to
perform more advanced
autovacuum tuning, with any of the following goals in mind:"I love calling out the audience in some way. That's really helpful, as are the
stated goals in the bullet list. However, as someone feeling pretty novice
after reading all of this, I can't honestly connect how the content on this page
helps me to more advanced tuning. I have a much better idea how freezing,
in particular, works (yay!), but I'm feeling a bit dense how almost anything
here helps me tune vacuum, at least as it relates to the bullets.I'm sure you have a connection in mind for each, and certainly understanding the
inner workings of what's happening under the covers is tremendously beneficial,
but when I search for "response" or "performance" in this document, it refers
back to another page (not included in this patch) that talks about the
thresholds.It might be as simple as adding something to the end of each bullet to draw
that relationship, but as is, it's hard for me to do it mentally (although I can
conjecture a few things on my own)That said, I definitely appreciate the callout that tuning is an
iterative process
and the minor switch from "creates a substantial amount of I/O
traffic" to "may create...".** Section 2.5.1 - Recovering Disk Space **
3). "The space dead tuples occupy must eventually be reclaimed for reuse
by new rows, to avoid unbounded growth of disk space requirements. Reclaiming
space from dead rows is VACUUM's main responsibility."It feels like one connection you could make to the bullet list above
is in this area
and not mentioned. By freeing up space and reducing the number of pages that
need to be read for satisfying a query, vacuum and recovering disk space
(theoretically) improves query performance. Not 100% how to add it in context
of these first two paragraphs.4) Caution: "It may be a good idea to add monitoring to alert you about this."
I hate to be pedantic about it, but I think we should spell out
"this". Do we have
a pointer in documentation to what kinds of things to monitor for? Am monitoring
long-running transactions or some metric that shows me that VACUUM is being
"held back"? I know what you mean, but it's not clear to me how to do the right
thing in my environment here.5) The plethora of tips/notes/warnings.
As you and others have mentioned, as presented these really have no context
for me. Individually they are good/helpful information, but it's
really hard to make
a connection to what I should "do" about it.It seems to me that this would be a good place to put a subsection which is
something like, "A note about reclaiming disk space" or something. In my
experience, most people hear about and end up using VACUUM FULL because
things got out of control and they want to get into a better spot (I have been
in that boat). I think with a small section that says, in essence,
"hey, now that
you understand why/how vacuum reclaims disk resources normally, if you're
in a position where things aren't in a good state, this is what you need to know
if you want to reclaim space from a really inefficient table"For me, at least, I think it would be easier to read/grok what you're sharing in
these callouts.6) One last missing piece that very well might be in another page not referenced
(I obviously need to get the PG16 docs pulled and built locally so
that I can have
better overall reference. My apologies).In my experience, one of the biggest issues with the thresholds and recovering
space is the idea of tuning individual tables, not just the entire
database. 5/10/20%
might be fine for most tables, but it's usually the really active ones
that need the
tuning, specifically lowering the thresholds. That doesn't come across to me in
this section at all. Again, maybe I've missed something on another page and
it's all good, but it felt worth calling out.Plus, it may provide an opportunity to bring in the threshold formulas again if
they aren't referenced elsewhere (although they probably are).Hope that makes sense.
** Section 2.5.2: Freezing to manage... **
As stated above, the effort here overall is great IMO. I like the flow
and reduction
in alarmist tone for things like wraparound, etc. I understand more
about freezing,
aggressive and otherwise, than I did before.7) That said, totally speaking as a non-contributor, this section is
obviously very long
for good reason. But, by the time I've gotten down to 25.2.2.3, my
brain is a bit
bewildered on where we've gotten to. That's more a comment on my capability
to process it all, but I wonder if a slightly more explicit intro
could help set the
stage at least."One side-effect of vacuum and transaction ID management at the row level is
that PostgreSQL would normally need to inspect each row for every query to
ensure it is visible to each requesting transaction. In order to
reduce the need to
read and inspect excessive amounts of data at query time or when normal vacuum
maintenance kicks in, VACUUM has a second job called freezing, which
accomplishes three goals: (attempting to tie in the three sections)
* speeding up queries and vacuum operations by...
* advancing the transaction ID space on generally static tables...
* ensure there are always free transaction IDs available for normal
operation...
"Maybe totally worthless and too much, but something like that might set a reader
up for just a bit more context. Then you could take most of what comes before
"2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
with something like "2.5.2.2.1 Normal Freezing Activity"8) Note "In PostgreSQL versions before 16..."
Showing my naivety, somehow this isn't connecting with me totally. If
it's important
to call out, then maybe we need a connecting sentence. Based on the content
above, I think you're pointing to "It's also why VACUUM will freeze all eligible
tuples from a heap page once the decision to freeze at least one tuple
is taken:"
If that's it, it's just not clear to me what's totally changed. Sorry,
more learning. :-)---
Hope something in there is helpful.Ryan Booz
--
Peter Geoghegan
On Fri, May 12, 2023 at 10:36 AM Ryan Booz <ryan@softwareandbooz.com> wrote:
Just to say on the outset, as has been said earlier in the tread by others,
that this is herculean work. Thank you for putting the effort you have thus far.
Thanks!
It would be nice if it was possible to add an animation/diagram a
little like this one: https://tuple-freezing-demo.angusd.com (this is
how I tend to think about the "transaction ID space".)Indeed. With volunteer docs, illustrations/diagrams are hard for sure. But,
this or something akin to the "clock" image I've seen elsewhere when
describing the transaction ID space would probably be helpful if it were ever
possible. In fact, there's just a lot about the MVCC stuff in general that
would benefit from diagrams. But alas, I guess that's why we have some
good go-to community talks/slide decks. :-)
A picture is worth a thousand words. This particular image may be
worth even more, though.
It happens to be *exactly* what I'd have done if I was tasked with
coming up with an animation that conveys the central ideas. Obviously
I brought this image up because I think that it would be great if we
could find a way to do something like that directly (not impossible,
there are a few images already). However, there is a less obvious
reason why I brought it to your attention: it's a very intuitive way
of understanding what I actually intend to convey through words -- at
least as far as talk about the cluster-wide XID space is concerned. It
might better equip you to review the patch series.
Sure, the animation will make the general idea clearer to just about
anybody -- that's a big part of what I like about it. But it also
captures the nuance that might matter to experts (e.g., the oldest XID
moves forward in jerky discrete jumps, while the next/unallocated XID
moves forward in a smooth, continuous fashion). So it works on
multiple levels, for multiple audiences/experience levels, without any
conflicts -- which is no small thing.
Do my words make you think of something a little like the animation?
If so, good.
Thanks again for doing this. Really helpful for doc newbies like me that
want to help but are still working through the process. Really helpful
and appreciated.
I think that this is the kind of thing that particularly benefits from
diversity in perspectives.
Agree. This flows fairly well and helps the user understand that each
"next step"
in the vacuum/freezing process has a distinct job based on previous work.
I'm trying to make it possible to read in short bursts, and to skim.
The easiest wins in this area will come from simply having more
individual sections/headings, and a more consistent structure. The
really difficult part is coming up with prose that can sort of work
for all audiences at the same time -- without alienating anybody.
Here is an example of what I mean:
The general idea of freezing can reasonably be summarized as "a
process that VACUUM uses to make pages self-contained (no need to do
pg_xact lookups anymore), that also has a role in avoiding transaction
ID exhaustion". That is a totally reasonable beginner-level (well,
relative-beginner-level) understanding of freezing. It *isn't* dumbed
down. You, as a beginner, have a truly useful take-away. At the same
time, you have avoided learning anything that you'll need to unlearn
some day. If I can succeed in doing that, I'll feel a real sense of
accomplishment.
* Much improved "Truncating Transaction Status Information" subsection.
My explanation of the ways in which autovacuum_freeze_max_age can
affect the storage overhead of commit/abort status in pg_xact is much
clearer than it was in v3 -- pg_xact truncation is now treated as
something loosely related to the global config of anti-wraparound
autovacuum, which makes most sense.This one isn't totally sinking in with me yet. Need another read.
"Truncating Transaction Status Information" is explicitly supposed to
matter much less than the rest of the stuff on freezing. The main
benefit that the DBA can expect from understanding this content is how
to save a few GB of disk space for pg_xact, which isn't particularly
likely to be possible, and is very very unlikely to be of any real
consequence, compared to everything else. If you were reading the
revised "Routine Vacuuming" as the average DBA, what you'd probably
have ended up doing is just not reading this part at all. And that
would probably be the ideal outcome. It's roughly the opposite of what
you'll get right now, by the way (bizarrely, the current docs place a
great deal of emphasis on this).
(Of course I welcome your feedback here too. Just giving you the context.)
It took a great deal of effort to find a structure that covered
everything, and that highlighted all of the important relationships
without going too far, while at the same time not being a huge mess.
That's what I feel I've arrived at with v4.In most respects I agree with the overall flow of changes w.r.t the current doc.
Focusing on all of this as something that should normally just be happening
as part of autovacuum is helpful. Working through it as an order of operations
(and I'm just assuming this is the general order) feels like it ties
things together
a lot more. I honestly come away from this document with more of a "I understand
the process" feel than I did previously.
That's great news. It might be helpful to give you more context about
the particular approach I've taken here, and how it falls short of
what I'd ideally like to do, in my own mind.
There are some rather strange things that happen to be true about
VACUUM and freezing today, that definitely influenced the way I
structured the docs. I can imagine an improved version of VACUUM that
is not so different to the real VACUUM that we have today (one that
still has freezing as we know it), that still has a much simpler UI --
some policy-based process for deciding which pages to freeze that was
much smarter than a simple trigger. If we were living in a world where
VACUUM actually worked like that, then I'd have been able to come up
with a structure that is a lot closer to what you might have been
hoping for from this patch series. At the very least, I'd have been
able to add some "TL;DR" text at the start of each section, that just
gave the main practical takeaway.
Take vacuum_freeze_min_age. It's a *really* bad design, even on its
own terms, even if we assume that nothing can change about how
freezing works. Yet it's probably still the most important
freezing-related GUC, even in Postgres 16. History matters here. The
GUC was invented in a world before the visibility map existed. When
the visibility map was invented, aggressive VACUUM was also invented
(before then the name for what we now call "aggressive VACUUM" was
actually just "VACUUM"). This development utterly changed the way that
vacuum_freeze_min_age actually works, but we still talk about it as if
its idea of "age" can be considered in isolation, as a universal
catch-all that can be tuned iteratively. The reality is that it is
interpreted in a way that is *hopelessly* tied to other things.
This isn't a minor point. There are really bizarre implications, with
real practical consequences. For example, suppose you want to make
autovacuums run more often against a simple append-only table -- so
you lower autovacuum_vacuum_insert_scale_factor with that in mind.
It's entirely possible that you'll now do *less* useful work, even
though you specifically set out to vacuum more aggressively! This is
due to the way the GUCs interact with each other, of course: the more
often VACUUM runs, the less likely it is that it'll find XIDs before
vacuum_freeze_min_age to trigger freezing during any individual VACUUM
operation, the less useful work you'll do (you'll just accumulate
unfrozen all-visible pages until you finally have an aggressive
VACUUM).
This is exactly as illogical as it sounds. Postgres 16 will be the
first version that even shows instrumentation around freezing at all
in the log reports from autovacuum. This will be a real eye-opener, I
suspect -- I predict that people will be surprised at how freezing
works with their workload, when they finally have the opportunity to
see it for themselves.
Owing to the lingering belief that many users have whereby hosting providers
have magically enabled Postgres to do all of this for you, there is
still a need to
actively deal with these thresholds based on load. That is, as far as
I understand,
Postgres doesn't automatically adjust based on load. Someone/thing
still has to modify
the thresholds as load and data size changes.
Well, vacuum_freeze_min_age (anything based on XID age) runs into the
following problem: what is the relationship between XID age, and
freezing work? This is a question whose answer is much too
complicated, suggesting that it's just the wrong question. There is
absolutely no reason to expect a linear relationship (or anything like
it) between XIDs consumed and WAL required to freeze rows from those
XIDs. It's a totally chaotic thing.
The reason for this is: of course it is, why wouldn't it be? On Monday
you'll do a bulk load, and 1 XID will write 1TB to one table. On
Tuesday, there might be only one row per XID consumed, with millions
and millions of rows inserted. This is 100% common sense, and yet is
kinda at odds with the whole idea of basing the decision to freeze on
age (as if vacuum_freeze_min_age didn't have enough problems
already!).
For now, I think our best bet is to signal the importance of avoiding
disaster to intermediate users, and signal the importance of iterative
tuning to advanced users.
If the "workload requirements" is pointing towards aggressive
freezing/wraparound
tasks that happen regardless of thresholds, then for me at least that
isn't clear
in that sentence and it feels like there's an implication that
Postgres/autovacuum
is going to magically adjust overall vacuum work based on database workload.
That's a good point.
2) "The intended audience is database administrators that wish to
perform more advanced
autovacuum tuning, with any of the following goals in mind:"I love calling out the audience in some way. That's really helpful, as are the
stated goals in the bullet list. However, as someone feeling pretty novice
after reading all of this, I can't honestly connect how the content on this page
helps me to more advanced tuning.
You're right to point that out; the actual content here was written
half-heartedly, in part because it depends on the dead-tuple-space
patch, which is not my focus at all right now.
Here is what I'd like the message to be, roughly:
1. This isn't something that you read once. You read it in small
bites. You come back to it from time to time (or you will if you need
to).
At one point Samay said: "I'll give my perspective as someone who has
not read the vacuum code but have learnt most of what I know about
autovacuum / vacuuming by reading the "Routine Vacuuming" page 10s of
times". I fully expect that a minority of users will want to do the
same with these revised docs. The content is very much not supposed to
be read through in one sitting (not if you expect to get any value out
of it). It is very calorie dense, and I don't think that that's really
a problem to be solved.
You have a much better chance of getting value out of it if you as a
user refer back to it as problems emerge. Some things may only click
after the second or third read, based on the experience of trying to
put something else into action in production.
2. If you don't like that it's calorie dense, then that's probably
okay -- just don't read past the parts that seem useful.
3. There are one or two exceptions (e.g., the "Tip" about freezing for
append-only tables), but overall there isn't going to be a simple
formula to follow -- the closest thing might be "don't bother doing
anything until it proves necessary".
This is because too much depends on individual workload requirements.
It is also partly due to it just being really hard to tune things like
vacuum_freeze_min_age very well right now.
4. It's an applied process. The emphasis should be on solving
practical, observed problems that are directly observed -- this isn't
a cookbook (though there are a couple of straightforward recipes,
covering one or two specific things).
** Section 2.5.1 - Recovering Disk Space **
It should be noted that what I've done in this area is quite
incomplete. I have only really done structural things here, and some
of these may not be much good.
It feels like one connection you could make to the bullet list above
is in this area
and not mentioned. By freeing up space and reducing the number of pages that
need to be read for satisfying a query, vacuum and recovering disk space
(theoretically) improves query performance. Not 100% how to add it in context
of these first two paragraphs.
It's hard, because it's not so much that vacuuming improves query
performance. It's more like *not* vacuuming hurts it. The exact point
that it starts to hurt is rather hard to predict -- and there might be
little point in trying to predict it with precision.
I tend to think that I'd probably be better off saying nothing about
query response times. Or saying something negative (what to definitely
avoid), not something positive (what to do) -- I would expect it to
generalize a lot better that way.
4) Caution: "It may be a good idea to add monitoring to alert you about this."
I hate to be pedantic about it, but I think we should spell out
"this". Do we have
a pointer in documentation to what kinds of things to monitor for? Am monitoring
long-running transactions or some metric that shows me that VACUUM is being
"held back"? I know what you mean, but it's not clear to me how to do the right
thing in my environment here.
Will do.
5) The plethora of tips/notes/warnings.
As you and others have mentioned, as presented these really have no context
for me. Individually they are good/helpful information, but it's
really hard to make
a connection to what I should "do" about it.
Yeah, I call that out in the relevant commit message of the patch as
bad, as temporary.
It seems to me that this would be a good place to put a subsection which is
something like, "A note about reclaiming disk space" or something. In my
experience, most people hear about and end up using VACUUM FULL because
things got out of control and they want to get into a better spot (I have been
in that boat). I think with a small section that says, in essence,
"hey, now that
you understand why/how vacuum reclaims disk resources normally, if you're
in a position where things aren't in a good state, this is what you need to know
if you want to reclaim space from a really inefficient table"For me, at least, I think it would be easier to read/grok what you're sharing in
these callouts.
That's the kind of thing that I had planned on with VACUUM FULL,
actually. You know, once I'm done with freezing. There is passing
mention of this in the relevant commit message.
In my experience, one of the biggest issues with the thresholds and recovering
space is the idea of tuning individual tables, not just the entire
database. 5/10/20%
might be fine for most tables, but it's usually the really active ones
that need the
tuning, specifically lowering the thresholds. That doesn't come across to me in
this section at all. Again, maybe I've missed something on another page and
it's all good, but it felt worth calling out.
I think that that's true. The rules are kind of different for larger tables.
read and inspect excessive amounts of data at query time or when normal vacuum
maintenance kicks in, VACUUM has a second job called freezing, which
accomplishes three goals: (attempting to tie in the three sections)
* speeding up queries and vacuum operations by...
* advancing the transaction ID space on generally static tables...
* ensure there are always free transaction IDs available for normal
operation...
"Maybe totally worthless and too much, but something like that might set a reader
up for just a bit more context. Then you could take most of what comes before
"2.5.2.2.1 Aggressive Vacuum" as a subsection (would require a renumber below)
with something like "2.5.2.2.1 Normal Freezing Activity"
I think I know what you mean. What I've tried to do here is start with
freezing, and describe it as something that has immediate benefits,
that can be understood as useful, independently of its role in
advancing relfrozenxid later on. So now you wonder: what specific
benefits do I get?
It's hard to be too concrete about those benefits, because you have
things like hint bits. I could say something like that, but I think
I'd have to hedge too much, because you also have hint bits, that help
query response times in roughly the same way (albeit less reliably,
albeit without being set on physical replication standbys when they're
set on the primary).
8) Note "In PostgreSQL versions before 16..."
Showing my naivety, somehow this isn't connecting with me totally. If
it's important
to call out, then maybe we need a connecting sentence. Based on the content
above, I think you're pointing to "It's also why VACUUM will freeze all eligible
tuples from a heap page once the decision to freeze at least one tuple
is taken:"
If that's it, it's just not clear to me what's totally changed. Sorry,
more learning. :-)
In Postgres 15, vacuum_freeze_min_age was applied in a way that only
froze whatever XIDs could be frozen from the page -- so if you had
half the tuples that were older, and half that were younger, you'd
only freeze the older half. Even when it might have cost you
practically nothing to freeze them all in one go. Now, as the text
you've quoted points out, vacuum_freeze_min_age triggers freezing at
the level of whole pages, including for new XIDs (though only if
they're eligible to be frozen, meaning that everybody agrees that
they're all visible now). So vacuum_freeze_min_age picks pages to
freeze, not individual tuples to freeze (this optimization is so
obvious that it's a little surprising that it took as long as it did
to get in).
Page-level freezing justifies the following statement from the patch,
for example:
"It doesn't matter if it was vacuum_freeze_table_age or
vacuum_multixact_freeze_table_age that made VACUUM use its aggressive
strategy. Every aggressive VACUUM will advance relfrozenxid and
relminmxid by applying the same generic policy that controls which
pages are frozen."
Now, since freezing works at the level of physical heap pages in 16,
the thing that triggers aggressive VACUUM matters less (just as the
thing that triggers freezing of individual pages matters much less --
freezing is freezing). There is minimal risk of freezing the same page
3 times during each of 3 different aggressive VACUUMs. To a much
greater extent, 3 aggressive VACUUMs isn't that different to only 1
aggressive VACUUM for those pages that were already "settled" from the
start. As a result, the addition of page-level freezing made
vacuum_freeze_min_age somewhat less bad -- in 16, its behavior was a
little less dependent on the phase of the moon (especially during
aggressive VACUUMs).
I really value stuff like that -- cases where you as a user can think
of something as independent to some other thing that you also need to
tune. There needs to be a lot more such improvements, but at least we
have this one now.
--
Peter Geoghegan
On Thu, 11 May 2023 at 16:41, Peter Geoghegan <pg@bowt.ie> wrote:
I say "exhaustion" or "overload" are meaningless because their meaning
is entirely dependent on context. It's not like memory exhaustion or
i/o overload where it's a finite resource and it's just the sheer
amount in use that matters.But transaction IDs are a finite resource, in the sense that you can
never have more than about 2.1 billion distinct unfrozen XIDs at any
one time. "Transaction ID exhaustion" is therefore a lot more
descriptive of the underlying problem. It's a lot better than
wraparound, which, as I've said, is inaccurate in two major ways:
I realize that's literally true that xids are a finite resource. But
that's not what people think of when you talk about exhausting a
finite resource.
It's not like there are 2 billion XIDs in a big pool being used and
returned and as long as you don't use too many XIDs leaving the pool
empty you're ok. When people talk about resource exhaustion they
imagine that they just need a faster machine or some other way of just
putting more XIDs in the pool so they can keep using them at a faster
rate.
I really think focusing on changing one term of art for another,
neither of which is at all meaningful without an extensive technical
explanation helps anyone. All it does is hide all the existing
explanations that are all over the internet from them.
1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.
Fwiw I've never actually bumped into anyone talking about integer
overflow (which isn't usually called "wraparound" anyways). And in any
case it's not a terrible misconception, it at least gives users a
reasonable model for how XID space consumption works. In fact it's not
even entirely wrong -- it's not the XID itself that's overflowing but
the difference between the XID and the frozen XID.
--
greg
On Sat, May 13, 2023 at 7:47 PM Greg Stark <stark@mit.edu> wrote:
It's not like there are 2 billion XIDs in a big pool being used and
returned and as long as you don't use too many XIDs leaving the pool
empty you're ok. When people talk about resource exhaustion they
imagine that they just need a faster machine or some other way of just
putting more XIDs in the pool so they can keep using them at a faster
rate.I really think focusing on changing one term of art for another,
neither of which is at all meaningful without an extensive technical
explanation helps anyone. All it does is hide all the existing
explanations that are all over the internet from them.
Have you read the documentation in question recently? The first two
paragraphs, in particular:
https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
As I keep pointing out, we literally introduce the whole topic of
freezing/wraparound by telling users that VACUUM needs to avoid
wraparound to stop your database from becoming corrupt. Which is when
"the past becomes the future", or in simple terms data corruption
(without any qualification about single user mode being required to
corrupt the DB). Users think that that's what "wraparound" means
because we've taught them to think that. We've already been giving
users "an extensive technical explanation" for many years -- one that
happens to be both harmful and factually incorrect.
I agree that users basically don't care about unsigned vs signed vs
whatever. But there is a sense that that matters, because the docs
have made it matter. That's my starting point. That's the damage that
I'm trying to undo.
1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.Fwiw I've never actually bumped into anyone talking about integer
overflow (which isn't usually called "wraparound" anyways). And in any
case it's not a terrible misconception, it at least gives users a
reasonable model for how XID space consumption works. In fact it's not
even entirely wrong -- it's not the XID itself that's overflowing but
the difference between the XID and the frozen XID.
Even your "not entirely wrong" version is entirely wrong. What you
describe literally cannot happen (outside of single user mode),
because xidStopLimit stops it from happening. To me your argument
seems similar to arguing that it's okay to call chemotherapy "cancer"
on the grounds that "cancer" refers to something that you really ought
to avoid in the first place in any case, which makes the whole
distinction irrelevant to non-oncologists.
That said, I concede that the term wraparound is too established to
just get rid of now. The only viable way forward now may be to
encourage users to think about it in the way that you suppose they
must already think about it. That is, to prominently point out that
"wraparound" actually refers to a protective mode of operation where
XID allocations are temporarily disallowed. And not pretty much
nothing to do with "wraparound" of the kind that the user may be
familiar with from other contexts. Including (and especially) all
earlier versions of the Postgres docs.
--
Peter Geoghegan
On Sun, May 14, 2023 at 1:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
Have you read the documentation in question recently? The first two
paragraphs, in particular:https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
As I keep pointing out, we literally introduce the whole topic of
freezing/wraparound by telling users that VACUUM needs to avoid
wraparound to stop your database from becoming corrupt. Which is when
"the past becomes the future"
I went through the history of maintenance.sgml. "Routine Vacuuming"
dates back to 2001. Sure enough, our current "25.1.5. Preventing
Transaction ID Wraparound Failures" introductory paragraphs (the ones
that I find so misleading and alarmist) appear in the original
version, too. But in 2001, they weren't alarmist -- they were
proportionate to the risk that existed at the time. This becomes
totally clear once you see the original. In particular, once you see
the current introductory paragraphs next to another paragraph in the
original 2001 version:
The later paragraph follows up by saying: "In practice this [somewhat
regular vacuuming] isn't an onerous requirement, but since the
consequences of failing to meet it can be ___complete data loss___
(not just wasted disk space or slow performance), some special
provisions...". This means that my particular interpretation of the
25.1.5. introductory paragraphs are absolutely consistent with the
original intent from the time they were written. I'm now more
confident than ever that all of the stuff about "catastrophic data
loss" should have been removed in 2005 or 2006 at the latest. It
*almost* was removed around that time, but for whatever reason it
wasn't removed in full. And for whatever reason it didn't quite
register with anybody in a position to do much about it.
--
Peter Geoghegan