pgsql: Update guidance for running vacuumdb after pg_upgrade.

Started by Nathan Bossart12 months ago12 messages

nathandbossart@gmail.com

12 months ago

Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: /messages/by-id/Z5O1bpcwDrMgyrYy@nathan

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c9d502eb68094d817fe94c2e954f6fdbb62f4d48

Modified Files
--------------
doc/src/sgml/ref/pgupgrade.sgml | 9 +++++----
src/bin/pg_upgrade/check.c | 2 +-
2 files changed, 6 insertions(+), 5 deletions(-)

Christoph Berg

myon@debian.org

11 months ago

In reply to: Nathan Bossart (#1)

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

Re: Nathan Bossart

Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.

I am wondering if this is really good advice in the pg_upgrade
documentation. Sure it's nice that optimizer statistics are carried
over by pg_upgrade, but the pg_stat_user_tables statistics are not
carried over, and afaict these are the numbers that determine when the
next autovacuum or autoanalyze run is going to happen. By removing the
"please run vacuumdb on all tables" step from the pg_upgrade docs, we
are effectively telling everyone that they should be starting with
these numbers all 0, postponing the next run to some indeterminate
point. Running `vacuumdb --missing-stats-only` does not fix that
because it's skipping the tables. Is that the message we want to send?

(If I am misinterpreting the situation the docs should still explain
why this is ok.)

Christoph

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: Christoph Berg (#2)

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote:

Re: Nathan Bossart

Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.

I am wondering if this is really good advice in the pg_upgrade
documentation. Sure it's nice that optimizer statistics are carried
over by pg_upgrade, but the pg_stat_user_tables statistics are not
carried over, and afaict these are the numbers that determine when the
next autovacuum or autoanalyze run is going to happen. By removing the
"please run vacuumdb on all tables" step from the pg_upgrade docs, we
are effectively telling everyone that they should be starting with
these numbers all 0, postponing the next run to some indeterminate
point. Running `vacuumdb --missing-stats-only` does not fix that
because it's skipping the tables. Is that the message we want to send?

(If I am misinterpreting the situation the docs should still explain
why this is ok.)

relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and
mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set
dead_tuples to a nonzero value, so the worst-case scenario is that it would
take longer before a vacuum is triggered based on
autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this,
I think we'd need to recommend using "vacuumdb --all --analyze-only"
instead. We could alternatively suggest first running "vacuumdb --all
--analyze-in-stages --missing-stats-only" (to fill in any missing stats)
followed by "vacuumdb --all --analyze-only" (to update dead_tuples).

However, I'm not sure how concerned to be about this. It does seem bad
that it might take longer for tables to be vacuumed for the first time
after upgrade, but I believe that's already the case for any type of
unclean shutdown (e.g., immediate shutdown, server crash, starting from a
base backup, point-in-time recovery). I see that we do recommend running
ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.

--
nathan

Christoph Berg

myon@debian.org

11 months ago

In reply to: Nathan Bossart (#3)

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

Re: Nathan Bossart

In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.

Maybe the docs (and the pg_upgrade scripts) should recommend the old
procedure by default until this gap is closed? People could then still
opt to use the new procedure in specific cases.

Christoph

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: Christoph Berg (#4)

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

On Tue, Apr 22, 2025 at 11:03:29PM +0200, Christoph Berg wrote:

Re: Nathan Bossart

In any case, IMO it's unfortunate
that we might end up recommending roughly the same post-upgrade steps as
before even though the optimizer statistics are carried over.

Maybe the docs (and the pg_upgrade scripts) should recommend the old
procedure by default until this gap is closed? People could then still
opt to use the new procedure in specific cases.

I think we'd still want to modify the --analyze-in-stages recommendation
(from what is currently recommended for supported versions). If we don't,
you'll wipe out the optimizer stats you brought over from the old version.

Here is a rough draft of what I am thinking.

--
nathan

Christoph Berg

myon@debian.org

11 months ago

In reply to: Christoph Berg (#2)

vacuumdb --missing-stats-only and pg_upgrade from PG13

Re: To Nathan Bossart

Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

I've been looking at vacuumdb --missing-stats-only because Debian's
pg_upgradecluster is using that now.

The reason I was looking closely yesterday is because Debian's
regression tests were tripping over it, but I only figured out the
problem today:

If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
and then pg_upgrade to 18 and run vacuumdb --analyze-only
--missing-stats-only, the table will not get analyzed. The only table
visited there is pg_largeobject.

Upgrades from 14..17 are fine.

Christoph

Christoph Berg

myon@debian.org

11 months ago

In reply to: Nathan Bossart (#5)

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

Re: Nathan Bossart

pg_log(PG_REPORT,
+		   "Some statistics are not transferred by pg_upgrade.\n"
"Once you start the new server, consider running:\n"
+		   "    %s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n"
+		   "    %s/vacuumdb %s--all --analyze-only",

I would make it "Once you start the new server, run these two commands:"
to make it explicit that both should be run, not alternatively either.

This patch addresses my concern, thanks.

Christoph

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: Christoph Berg (#6)

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

On Wed, Apr 23, 2025 at 04:01:33PM +0200, Christoph Berg wrote:

If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
and then pg_upgrade to 18 and run vacuumdb --analyze-only
--missing-stats-only, the table will not get analyzed. The only table
visited there is pg_largeobject.

I suspect this is due to commit 3d351d9, which started using -1 for
reltuples before the first vacuum/analyze. Before that, we set it to 0,
which could also mean the table is empty. --missing-stats-only checks for
reltuples != 0.

My first reaction is that we should just remove the reltuples != 0 check.
That means vacuumdb might analyze some empty tables, but that doesn't seem
too terrible.

--
nathan

Christoph Berg

myon@debian.org

11 months ago

In reply to: Nathan Bossart (#8)

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

Re: Nathan Bossart

My first reaction is that we should just remove the reltuples != 0 check.
That means vacuumdb might analyze some empty tables, but that doesn't seem
too terrible.

Or some tables that aren't empty but were never analyzed when they
should have been. Sounds like a good thing.

Thanks!

Christoph

#10

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: Christoph Berg (#9)

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.

--
nathan

#11

Christoph Berg

myon@debian.org

11 months ago

In reply to: Nathan Bossart (#10)

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

Re: Nathan Bossart

Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.

I confirm my PG13 test table gets analyzed now with the patch.

Christoph

#12

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: Christoph Berg (#11)

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

On Thu, Apr 24, 2025 at 03:25:55PM +0200, Christoph Berg wrote:

Re: Nathan Bossart

Here is what I have staged for commit. I'll aim to commit these patches
sometime next week to give time for additional feedback.

I confirm my PG13 test table gets analyzed now with the patch.

Committed.

--
nathan

pgsql: Update guidance for running vacuumdb after pg_upgrade.

Attachments:

Attachments:

Attachments: