pgsql: Update guidance for running vacuumdb after pg_upgrade.
Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: /messages/by-id/Z5O1bpcwDrMgyrYy@nathan
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/c9d502eb68094d817fe94c2e954f6fdbb62f4d48
Modified Files
--------------
doc/src/sgml/ref/pgupgrade.sgml | 9 +++++----
src/bin/pg_upgrade/check.c | 2 +-
2 files changed, 6 insertions(+), 5 deletions(-)
Re: Nathan Bossart
Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.
I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.
I am wondering if this is really good advice in the pg_upgrade
documentation. Sure it's nice that optimizer statistics are carried
over by pg_upgrade, but the pg_stat_user_tables statistics are not
carried over, and afaict these are the numbers that determine when the
next autovacuum or autoanalyze run is going to happen. By removing the
"please run vacuumdb on all tables" step from the pg_upgrade docs, we
are effectively telling everyone that they should be starting with
these numbers all 0, postponing the next run to some indeterminate
point. Running `vacuumdb --missing-stats-only` does not fix that
because it's skipping the tables. Is that the message we want to send?
(If I am misinterpreting the situation the docs should still explain
why this is ok.)
Christoph
On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote:
Re: Nathan Bossart
Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.I am wondering if this is really good advice in the pg_upgrade
documentation. Sure it's nice that optimizer statistics are carried
over by pg_upgrade, but the pg_stat_user_tables statistics are not
carried over, and afaict these are the numbers that determine when the
next autovacuum or autoanalyze run is going to happen. By removing the
"please run vacuumdb on all tables" step from the pg_upgrade docs, we
are effectively telling everyone that they should be starting with
these numbers all 0, postponing the next run to some indeterminate
point. Running `vacuumdb --missing-stats-only` does not fix that
because it's skipping the tables. Is that the message we want to send?(If I am misinterpreting the situation the docs should still explain
why this is ok.)
relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and
mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set
dead_tuples to a nonzero value, so the worst-case scenario is that it would
take longer before a vacuum is triggered based on
autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this,
I think we'd need to recommend using "vacuumdb --all --analyze-only"
instead. We could alternatively suggest first running "vacuumdb --all
--analyze-in-stages --missing-stats-only" (to fill in any missing stats)
followed by "vacuumdb --all --analyze-only" (to update dead_tuples).
However, I'm not sure how concerned to be about this. It does seem bad
that it might take longer for tables to be vacuumed for the first time
after upgrade, but I believe that's already the case for any type of
unclean shutdown (e.g., immediate shutdown, server crash, starting from a
base backup, point-in-time recovery). I see that we do recommend running
ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.
--
nathan
Re: Nathan Bossart
In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.
Maybe the docs (and the pg_upgrade scripts) should recommend the old
procedure by default until this gap is closed? People could then still
opt to use the new procedure in specific cases.
Christoph
On Tue, Apr 22, 2025 at 11:03:29PM +0200, Christoph Berg wrote:
Re: Nathan Bossart
In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.Maybe the docs (and the pg_upgrade scripts) should recommend the old
procedure by default until this gap is closed? People could then still
opt to use the new procedure in specific cases.
I think we'd still want to modify the --analyze-in-stages recommendation
(from what is currently recommended for supported versions). If we don't,
you'll wipe out the optimizer stats you brought over from the old version.
Here is a rough draft of what I am thinking.
--
nathan
Attachments:
post_upgrade_guidance.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index df13365b287..648c6e2967c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -833,17 +833,19 @@ psql --username=postgres --file=script.sql postgres
<para>
Because not all statistics are not transferred by
- <command>pg_upgrade</command>, you will be instructed to run a command to
+ <command>pg_upgrade</command>, you will be instructed to run commands to
regenerate that information at the end of the upgrade. You might need to
set connection parameters to match your new cluster.
</para>
<para>
- Using <command>vacuumdb --all --analyze-only --missing-stats-only</command>
- can efficiently generate such statistics. Alternatively,
+ First, use
<command>vacuumdb --all --analyze-in-stages --missing-stats-only</command>
- can be used to generate minimal statistics quickly. For either command,
- the use of <option>--jobs</option> can speed it up.
+ to quickly generate minimal optimizer statistics for relations without
+ any. Then, use <command>vacuumdb --all --analyze-only</command> to ensure
+ all relations have updated cumulative statistics for triggering vacuum and
+ analyze. For both commands, the use of <option>--jobs</option> can speed
+ it up.
If <varname>vacuum_cost_delay</varname> is set to a non-zero
value, this can be overridden to speed up statistics generation
using <envar>PGOPTIONS</envar>, e.g., <literal>PGOPTIONS='-c
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 18c2d652bb6..f1b90c5957e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -814,9 +814,12 @@ output_completion_banner(char *deletion_script_file_name)
}
pg_log(PG_REPORT,
- "Some optimizer statistics may not have been transferred by pg_upgrade.\n"
+ "Some statistics are not transferred by pg_upgrade.\n"
"Once you start the new server, consider running:\n"
- " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only", new_cluster.bindir, user_specification.data);
+ " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n"
+ " %s/vacuumdb %s--all --analyze-only",
+ new_cluster.bindir, user_specification.data,
+ new_cluster.bindir, user_specification.data);
if (deletion_script_file_name)
pg_log(PG_REPORT,
Re: To Nathan Bossart
Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.
The reason I was looking closely yesterday is because Debian's
regression tests were tripping over it, but I only figured out the
problem today:
If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
and then pg_upgrade to 18 and run vacuumdb --analyze-only
--missing-stats-only, the table will not get analyzed. The only table
visited there is pg_largeobject.
Upgrades from 14..17 are fine.
Christoph
Re: Nathan Bossart
pg_log(PG_REPORT, + "Some statistics are not transferred by pg_upgrade.\n" "Once you start the new server, consider running:\n" + " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n" + " %s/vacuumdb %s--all --analyze-only",
I would make it "Once you start the new server, run these two commands:"
to make it explicit that both should be run, not alternatively either.
This patch addresses my concern, thanks.
Christoph
On Wed, Apr 23, 2025 at 04:01:33PM +0200, Christoph Berg wrote:
If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
and then pg_upgrade to 18 and run vacuumdb --analyze-only
--missing-stats-only, the table will not get analyzed. The only table
visited there is pg_largeobject.
I suspect this is due to commit 3d351d9, which started using -1 for
reltuples before the first vacuum/analyze. Before that, we set it to 0,
which could also mean the table is empty. --missing-stats-only checks for
reltuples != 0.
My first reaction is that we should just remove the reltuples != 0 check.
That means vacuumdb might analyze some empty tables, but that doesn't seem
too terrible.
--
nathan
Attachments:
fix_missing_stats_only.patchtext/plain; charset=us-asciiDownload
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 22067faaf7d..79b1096eb08 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -954,7 +954,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
" WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
" AND NOT a.attisdropped\n"
" AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -967,7 +966,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
" WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
" AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
" WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
@@ -979,7 +977,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
" JOIN pg_catalog.pg_index i"
" ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
" WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
" OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
@@ -994,7 +991,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
" WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
" AND NOT a.attisdropped\n"
" AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -1011,7 +1007,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
" WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
" AND c.relhassubclass\n"
" AND NOT p.inherited\n"
Re: Nathan Bossart
My first reaction is that we should just remove the reltuples != 0 check.
That means vacuumdb might analyze some empty tables, but that doesn't seem
too terrible.
Or some tables that aren't empty but were never analyzed when they
should have been. Sounds like a good thing.
Thanks!
Christoph
Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.
--
nathan
Attachments:
v2-0001-Further-adjust-guidance-for-running-vacuumdb-afte.patchtext/plain; charset=us-asciiDownload
From 6d1e608edb17d9bbaaf7d57ace35fee68ff869c0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 23 Apr 2025 10:11:46 -0500
Subject: [PATCH v2 1/2] Further adjust guidance for running vacuumdb after
pg_upgrade.
Since pg_upgrade does not transfer the cumulative statistics used
for triggering autovacuum and autoanalyze, the server may take much
longer than expected to process them post-upgrade. Currently, the
pg_upgrade documentation recommends analyzing only relations for
which optimizer statistics were not carried over during upgrade.
This commit appends another recommendation to also analyze all
relations to update the relevant cumulative statistics, similar to
the recommendation for pg_stat_reset().
Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aAfxfKC82B9NvJDj%40msg.df7cb.de
---
doc/src/sgml/ref/pgupgrade.sgml | 12 +++++++-----
src/bin/pg_upgrade/check.c | 9 ++++++---
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index df13365b287..648c6e2967c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -833,17 +833,19 @@ psql --username=postgres --file=script.sql postgres
<para>
Because not all statistics are not transferred by
- <command>pg_upgrade</command>, you will be instructed to run a command to
+ <command>pg_upgrade</command>, you will be instructed to run commands to
regenerate that information at the end of the upgrade. You might need to
set connection parameters to match your new cluster.
</para>
<para>
- Using <command>vacuumdb --all --analyze-only --missing-stats-only</command>
- can efficiently generate such statistics. Alternatively,
+ First, use
<command>vacuumdb --all --analyze-in-stages --missing-stats-only</command>
- can be used to generate minimal statistics quickly. For either command,
- the use of <option>--jobs</option> can speed it up.
+ to quickly generate minimal optimizer statistics for relations without
+ any. Then, use <command>vacuumdb --all --analyze-only</command> to ensure
+ all relations have updated cumulative statistics for triggering vacuum and
+ analyze. For both commands, the use of <option>--jobs</option> can speed
+ it up.
If <varname>vacuum_cost_delay</varname> is set to a non-zero
value, this can be overridden to speed up statistics generation
using <envar>PGOPTIONS</envar>, e.g., <literal>PGOPTIONS='-c
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 18c2d652bb6..940fc77fc2e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -814,9 +814,12 @@ output_completion_banner(char *deletion_script_file_name)
}
pg_log(PG_REPORT,
- "Some optimizer statistics may not have been transferred by pg_upgrade.\n"
- "Once you start the new server, consider running:\n"
- " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only", new_cluster.bindir, user_specification.data);
+ "Some statistics are not transferred by pg_upgrade.\n"
+ "Once you start the new server, consider running these two commands:\n"
+ " %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n"
+ " %s/vacuumdb %s--all --analyze-only",
+ new_cluster.bindir, user_specification.data,
+ new_cluster.bindir, user_specification.data);
if (deletion_script_file_name)
pg_log(PG_REPORT,
--
2.39.5 (Apple Git-154)
v2-0002-vacuumdb-Don-t-skip-empty-relations-in-missing-st.patchtext/plain; charset=us-asciiDownload
From 299c8bea787fb1637b58390d351a07aaa2521ac9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 23 Apr 2025 10:37:42 -0500
Subject: [PATCH v2 2/2] vacuumdb: Don't skip empty relations in
--missing-stats-only.
Presently, --missing-stats-only skips relations with reltuples set
to 0 because empty relations don't get optimizer statistics.
However, before v14, a reltuples value of 0 was ambiguous: it could
either mean the relation is empty, or it could mean that it hadn't
yet been vacuumed or analyzed. (Commit 3d351d916b taught Postgres
14 and newer to use -1 for the latter case.) This ambiguity can
cause --missing-stats-only to inadvertently skip relations that
need optimizer statistics.
To fix, simply remove the check for reltuples != 0. This will
cause --missing-stats-only to analyze some empty tables, but that
doesn't seem too terrible a trade-off.
Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aAjyvW5_fRGNr7yF%40msg.df7cb.de
---
src/bin/scripts/vacuumdb.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 22067faaf7d..79b1096eb08 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -954,7 +954,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
" WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
" AND NOT a.attisdropped\n"
" AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -967,7 +966,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
" WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
" AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
" WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
@@ -979,7 +977,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
" JOIN pg_catalog.pg_index i"
" ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
" WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
" OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
@@ -994,7 +991,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
" WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
" AND NOT a.attisdropped\n"
" AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -1011,7 +1007,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
" OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
" WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
" AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
" AND c.relhassubclass\n"
" AND NOT p.inherited\n"
--
2.39.5 (Apple Git-154)
Re: Nathan Bossart
Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.
I confirm my PG13 test table gets analyzed now with the patch.
Christoph
On Thu, Apr 24, 2025 at 03:25:55PM +0200, Christoph Berg wrote:
Re: Nathan Bossart
Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.I confirm my PG13 test table gets analyzed now with the patch.
Committed.
--
nathan