vacuumdb changes for stats import/export

Started by Nathan Bossart12 months ago36 messages

nathandbossart@gmail.com

12 months ago

2 attachment(s)

On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:

On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:

On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:

I suggest that we make a new thread about the vacuumdb changes and
focus this thread and patch series on the pg_dump changes (and minor
flag adjustments to pg_upgrade).

Unless you think that the pg_dump changes should block on the vacuumdb
changes? In which case please let me know because the pg_dump changes
are otherwise close to commit.

I think that is a good idea. I don't see vacuumdb blocking this.

+1, I've been reviewing the vacuumdb portion and am planning to start a new
thread in the near future. IIUC the bulk of the vacuumdb changes are
relatively noncontroversial, we just haven't reached consensus on the user
interface.

As promised, I'm starting a new thread for this. The original thread [0]/messages/by-id/CADkLM=fR7TwH0cLREQkf5_=KLcOYVxJw0Km0i5MpaWeuDwVo6g@mail.gmail.com
has some preliminary discussion about the subject.

As you may be aware, there is an ongoing effort to carry over statistics
during pg_upgrade. Today, we encourage users to use vacuumdb to run
ANALYZE on all relations after upgrading. There's even a special
--analyze-in-stages option that fast-tracks an initial set of minimal
statistics for this use-case. Once the statistics are carried over by
pg_upgrade, there will be little need to do this, except for perhaps
extended statistics if they aren't carried over. But there are patches in
flight for that, too [1]/messages/by-id/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com.

This thread is dedicated to figuring out what, if anything, to do about
vacuumdb. I see the following general categories of options:

* Do nothing. Other than updating our recommended guidance for
post-upgrade analyzing, we'd leave vacuumdb alone. While this is
certainly a simple option, it has a couple of key drawbacks. For one,
anyone who doesn't see the new vacuumdb guidance may continue to do
unnecessary post-upgrade analyzes. Also, if we don't get the extended
statistics piece completed for v18, users will have to manually construct
ANALYZE commands for those to run post-upgrade.

* Add a breaking change so that users are forced to fix any outdated
post-upgrade scripts. This is what the attached patches do. In short,
they add a required parameter to --analyze-in-stages that can be set to
either "all" or "missing". The new "missing" mode generates ANALYZE
commands for relations that are missing statistics, while the "all" mode
does the same thing that --analyze-in-stages does today. While the
"missing" mode might be useful outside of upgrade cases, we could also
recommend it as a post-upgrade step if the extended statistics work
doesn't get committed for v18.

* Add a new option that will make it easy to ANALYZE any relations that are
missing statistics, but don't make any breaking changes to existing
post-upgrade scripts. This option isn't really strictly necessary if we
get the extended statistics parts committed, but it could be a nice
feature, anyway.

I chose the second approach because it had the most support in the other
thread, but I definitely wouldn't characterize it as a consensus. 0001
simply refactors the main catalog query to its own function so that its
results can be reused in later stages of --analyze-in-stages. This might
require a bit more memory and make --analyze-in-stages less responsive to
concurrent changes, but it wasn't all that responsive to begin with. 0002
adds the new "missing" mode functionality. Note that it regenerates all
statistics for a relation if any applicable statistics types are missing.
It's not clear whether we can or should do any better than that. Corey and
I put a lot of effort into the catalog query changes, and we think we've
covered everything, but we would of course appreciate some review on that
part.

BTW as long as the basic "missing" mode idea seems reasonable, it's easy
enough to adjust the user interface to whatever we want, and I'm happy to
do so as needed.

Finally, I think another open question is whether any of this should apply
to --analyze and/or --analyze-only. We do recommend the latter as a
post-upgrade step in our pg_upgrade documentation, and I could see the
"missing" mode being useful on its own for these modes, too.

Thoughts?

[0]: /messages/by-id/CADkLM=fR7TwH0cLREQkf5_=KLcOYVxJw0Km0i5MpaWeuDwVo6g@mail.gmail.com
[1]: /messages/by-id/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com

--
nathan

Attachments:

v1-0001-vacuumdb-Save-catalog-query-results-for-analyze-i.patchtext/plain; charset=us-asciiDownload

From 99d653f59e3ed7a7f5809e6546f0514ea8f0beda Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 24 Jan 2025 09:29:30 -0600
Subject: [PATCH v1 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker
---
 src/bin/scripts/vacuumdb.c | 316 +++++++++++++++++++++----------------
 1 file changed, 180 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 3dc428674ae..88ceb42746d 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, &found_objs,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +420,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +468,17 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the
+ * database are processed.  Otherwise, the catalog query performs a lookup for
+ * the objects listed in 'objects'.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +491,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -587,19 +599,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
 	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -753,23 +901,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedId(PQgetvalue(res, i, 1),
@@ -778,110 +915,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -912,6 +952,10 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs;
+
+		found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -928,7 +972,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, &found_objs[i],
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -942,7 +986,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v1-0002-vacuumdb-Allow-analyzing-in-stages-only-relations.patchtext/plain; charset=us-asciiDownload

From 320c0491dea628a0b56926563c94fa7044eb191c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 24 Jan 2025 09:30:14 -0600
Subject: [PATCH v1 2/2] vacuumdb: Allow analyzing-in-stages only relations
 missing stats.

This commit adds a required argument to --analyze-in-stages that
accepts either "missing" or "all".  "all" provides the same
behavior that --analyze-in-stages has today.  "missing" can be used
to instead analyze only relations that are missing statistics.
This new argument is intentionally marked as required to break any
existing post-pg_upgrade vacuumdb scripts.  Since stats are going
to be carried over by pg_upgrade to v18 or later, running
--analyze-in-stages=all is unnecessary for the vast majority of
users and is likely to even hurt performance post-upgrade.

XXX: I think the above paragraph changes depending on whether
extended statistics are carried over by pg_upgrade.  If they
aren't, we should probably recommend --analyze-in-stages=missing.
If they are, then we probably shouldn't recommend anything at all.
Does this also change how we structure the user-facing part of this
change?

Note that the new "missing" mode will generate ANALYZE commands for
a relation if it is missing any statistics it should ordinarily
have.  For example, if a table has statistics for one column but
not another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, table
inheritance, and partitioned tables.

XXX: Do we need to make a similar change to --analyze-only and/or
--analyze (e.g., a new --missing-stats-only option)?

Author: Corey Huinker
---
 doc/src/sgml/ref/vacuumdb.sgml           | 12 +++-
 src/bin/scripts/t/102_vacuumdb_stages.pl | 64 ++++++++++++++++++++-
 src/bin/scripts/vacuumdb.c               | 73 +++++++++++++++++++++++-
 src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++++
 4 files changed, 171 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..3ef8a52a163 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -429,7 +429,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--analyze-in-stages</option></term>
+      <term><option>--analyze-in-stages=<replaceable class="parameter">analyze_in_stages_option</replaceable></option></term>
       <listitem>
        <para>
         Only calculate statistics for use by the optimizer (no vacuum),
@@ -441,6 +441,8 @@ PostgreSQL documentation
        </para>
 
        <para>
+        When set to <literal>all</literal>, vacuumdb will calculate statistics
+        for all relations, even relations with existing statistics.
         This option is only useful to analyze a database that currently has
         no statistics or has wholly incorrect ones, such as if it is newly
         populated from a restored dump or by <command>pg_upgrade</command>.
@@ -449,6 +451,14 @@ PostgreSQL documentation
         transiently worse due to the low statistics targets of the early
         stages.
        </para>
+
+       <para>
+        When set to <literal>missing</literal>, vacuumdb will calculate
+        statistics only for relations that are missing statistics for a column,
+        index expression, or extended statistics object.  This option prevents
+        vacuumdb from deleting existing statistics so that the query
+        optimizer's choices do not become transiently worse.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..0f560accb46 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -12,7 +12,7 @@ $node->init;
 $node->start;
 
 $node->issues_sql_like(
-	[ 'vacuumdb', '--analyze-in-stages', 'postgres' ],
+	[ 'vacuumdb', '--analyze-in-stages', 'all', 'postgres' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
                    .*statement:\ ANALYZE
                    .*statement:\ SET\ default_statistics_target=10;\ RESET\ vacuum_cost_delay;
@@ -21,8 +21,68 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
 $node->issues_sql_like(
-	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--analyze-in-stages=missing with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', 'missing', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--analyze-in-stages=missing with no missing partition stats');
+
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', 'all', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
                    .*statement:\ ANALYZE
                    .*statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 88ceb42746d..ddc01e9d723 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -61,6 +61,7 @@ typedef enum
 } VacObjFilter;
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
+static bool missing_only;
 
 static SimpleStringList *retrieve_objects(PGconn *conn,
 										  vacuumingOptions *vacopts,
@@ -123,7 +124,7 @@ main(int argc, char *argv[])
 		{"schema", required_argument, NULL, 'n'},
 		{"exclude-schema", required_argument, NULL, 'N'},
 		{"maintenance-db", required_argument, NULL, 2},
-		{"analyze-in-stages", no_argument, NULL, 3},
+		{"analyze-in-stages", required_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
 		{"skip-locked", no_argument, NULL, 5},
 		{"min-xid-age", required_argument, NULL, 6},
@@ -246,6 +247,16 @@ main(int argc, char *argv[])
 				break;
 			case 3:
 				analyze_in_stages = vacopts.analyze_only = true;
+				if (strcmp(optarg, "all") == 0)
+					missing_only = false;
+				else if (strcmp(optarg, "missing") == 0)
+					missing_only = true;
+				else
+				{
+					pg_log_error("unrecognized --analyze-in-stages option: %s",
+								 optarg);
+					exit(1);
+				}
 				break;
 			case 4:
 				vacopts.disable_page_skipping = true;
@@ -808,6 +819,7 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -891,6 +903,63 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -1226,7 +1295,7 @@ help(const char *progname)
 	printf(_("  -V, --version                   output version information, then exit\n"));
 	printf(_("  -z, --analyze                   update optimizer statistics\n"));
 	printf(_("  -Z, --analyze-only              only update optimizer statistics; no vacuum\n"));
-	printf(_("      --analyze-in-stages         only update optimizer statistics, in multiple\n"
+	printf(_("      --analyze-in-stages=OPTION  only update optimizer statistics, in multiple\n"
 			 "                                  stages for faster results; no vacuum\n"));
 	printf(_("  -?, --help                      show this help, then exit\n"));
 	printf(_("\nConnection options:\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index f521ad0b12f..d30d97aa96c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2802,6 +2802,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

Corey Huinker

corey.huinker@gmail.com

12 months ago

In reply to: Nathan Bossart (#1)

Re: vacuumdb changes for stats import/export

Thoughts?

I don't have anything to add to what Nathan said, but thought I should say
so since this thread was broken off from my earlier thread. Eagerly
awaiting feedback.

Masahiko Sawada

sawada.mshk@gmail.com

12 months ago

In reply to: Nathan Bossart (#1)

Re: vacuumdb changes for stats import/export

On Fri, Jan 24, 2025 at 7:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Jan 06, 2025 at 03:27:18PM -0600, Nathan Bossart wrote:

On Mon, Dec 30, 2024 at 03:45:03PM -0500, Bruce Momjian wrote:

On Mon, Dec 30, 2024 at 12:02:47PM -0800, Jeff Davis wrote:

I suggest that we make a new thread about the vacuumdb changes and
focus this thread and patch series on the pg_dump changes (and minor
flag adjustments to pg_upgrade).

Unless you think that the pg_dump changes should block on the vacuumdb
changes? In which case please let me know because the pg_dump changes
are otherwise close to commit.

I think that is a good idea. I don't see vacuumdb blocking this.

+1, I've been reviewing the vacuumdb portion and am planning to start a new
thread in the near future. IIUC the bulk of the vacuumdb changes are
relatively noncontroversial, we just haven't reached consensus on the user
interface.

As promised, I'm starting a new thread for this. The original thread [0]
has some preliminary discussion about the subject.

As you may be aware, there is an ongoing effort to carry over statistics
during pg_upgrade. Today, we encourage users to use vacuumdb to run
ANALYZE on all relations after upgrading. There's even a special
--analyze-in-stages option that fast-tracks an initial set of minimal
statistics for this use-case. Once the statistics are carried over by
pg_upgrade, there will be little need to do this, except for perhaps
extended statistics if they aren't carried over. But there are patches in
flight for that, too [1].

This thread is dedicated to figuring out what, if anything, to do about
vacuumdb. I see the following general categories of options:

* Do nothing. Other than updating our recommended guidance for
post-upgrade analyzing, we'd leave vacuumdb alone. While this is
certainly a simple option, it has a couple of key drawbacks. For one,
anyone who doesn't see the new vacuumdb guidance may continue to do
unnecessary post-upgrade analyzes. Also, if we don't get the extended
statistics piece completed for v18, users will have to manually construct
ANALYZE commands for those to run post-upgrade.

* Add a breaking change so that users are forced to fix any outdated
post-upgrade scripts. This is what the attached patches do. In short,
they add a required parameter to --analyze-in-stages that can be set to
either "all" or "missing". The new "missing" mode generates ANALYZE
commands for relations that are missing statistics, while the "all" mode
does the same thing that --analyze-in-stages does today. While the
"missing" mode might be useful outside of upgrade cases, we could also
recommend it as a post-upgrade step if the extended statistics work
doesn't get committed for v18.

* Add a new option that will make it easy to ANALYZE any relations that are
missing statistics, but don't make any breaking changes to existing
post-upgrade scripts. This option isn't really strictly necessary if we
get the extended statistics parts committed, but it could be a nice
feature, anyway.

I chose the second approach because it had the most support in the other
thread, but I definitely wouldn't characterize it as a consensus. 0001
simply refactors the main catalog query to its own function so that its
results can be reused in later stages of --analyze-in-stages. This might
require a bit more memory and make --analyze-in-stages less responsive to
concurrent changes, but it wasn't all that responsive to begin with. 0002
adds the new "missing" mode functionality. Note that it regenerates all
statistics for a relation if any applicable statistics types are missing.
It's not clear whether we can or should do any better than that. Corey and
I put a lot of effort into the catalog query changes, and we think we've
covered everything, but we would of course appreciate some review on that
part.

BTW as long as the basic "missing" mode idea seems reasonable, it's easy
enough to adjust the user interface to whatever we want, and I'm happy to
do so as needed.

Finally, I think another open question is whether any of this should apply
to --analyze and/or --analyze-only. We do recommend the latter as a
post-upgrade step in our pg_upgrade documentation, and I could see the
"missing" mode being useful on its own for these modes, too.

Thoughts?

I've not closely reviewed the patches yet but I find that the second
approach is reasonable to me. IIUC if the extended statistics
import/export work gets committed for v18, we don't need a new option
for post-upgrade analyze. But as you mentioned, these new modes seem
useful for other use cases too. Given that it could be used outside of
post-upgrading, I think it would make more sense to apply the "all"
and "missing" modes to --analyze and --analyze-only options too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: Masahiko Sawada (#3)

2 attachment(s)

Re: vacuumdb changes for stats import/export

I had the opportunity to bring this patch set up for discussion at the
developer meeting at FOSDEM PGDay last week [0]https://2025.fosdempgday.org/devmeeting. There seemed to be a
strong consensus that the idea of a "missing only" mode for vacuumdb's
analyze options was useful (especially so if the extended stats piece of
the stats import/export project doesn't make it into v18), but that we
shouldn't change the default behavior of the existing options. Given that,
I have modified the patch set to instead introduce a --missing-only option
that can be used in conjuction with --analyze-only and --analyze-in-stages.
The underlying implementation is the same as in v1 of the patch set, except
for the following changes:

* I've modified the extended stats part of the query to also check for
pg_statistic_ext.stxstattarget IS DISTINCT FROM 0.

* I've added a new clause to check for extended statistics on tables with
inheritance children, i.e., those with pg_statistic_ext_data.stxdinherit
set to true.

* I've added a server version check that disallows --missing-only on
servers older than v15. The catalog query would need some adjustments to
work on older versions, but that didn't seem critically important. We
could always revisit this in the future.

[0]: https://2025.fosdempgday.org/devmeeting

--
nathan

Attachments:

v2-0001-vacuumdb-Save-catalog-query-results-for-analyze-i.patchtext/plain; charset=us-asciiDownload

From 39ce7c57c3b193061f3ae9db34d74e83249d3c4b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 24 Jan 2025 09:29:30 -0600
Subject: [PATCH v2 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: TODO
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 316 +++++++++++++++++++++----------------
 1 file changed, 180 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 74fbc7ef033..72986135764 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, &found_objs,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +420,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +468,17 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the
+ * database are processed.  Otherwise, the catalog query performs a lookup for
+ * the objects listed in 'objects'.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +491,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +611,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
+	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
 	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +913,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedId(PQgetvalue(res, i, 1),
@@ -790,110 +927,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -924,6 +964,10 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs;
+
+		found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -940,7 +984,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, &found_objs[i],
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -954,7 +998,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v2-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From 7036c0c4213e887a651790bcee8a42a9b89ee595 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 4 Feb 2025 15:07:54 -0600
Subject: [PATCH v2 2/2] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only and --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: TODO
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           | 16 +++++
 src/bin/scripts/t/102_vacuumdb_stages.pl | 60 ++++++++++++++++
 src/bin/scripts/vacuumdb.c               | 92 ++++++++++++++++++++++++
 src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++
 4 files changed, 195 insertions(+)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..5295a61f083 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> and <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 72986135764..fba5f466411 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -596,6 +606,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -820,6 +837,7 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -903,6 +921,79 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -1224,6 +1315,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index f521ad0b12f..d30d97aa96c 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2802,6 +2802,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

John Naylor

johncnaylorls@gmail.com

11 months ago

In reply to: Nathan Bossart (#4)

Re: vacuumdb changes for stats import/export

On Wed, Feb 5, 2025 at 4:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

[v2]

I started looking just at 0001, and it seems like a fairly
straightforward rearrangement. I found this comment quite hard to
read:

+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the

I had to read it several times before I noticed the difference between
"* found_objs" and "*found_objs". Maybe some extra spacing and breaks
would help, or other reorganization.

--
John Naylor
Amazon Web Services

Nathan Bossart

nathandbossart@gmail.com

11 months ago

In reply to: John Naylor (#5)

Re: vacuumdb changes for stats import/export

On Thu, Feb 27, 2025 at 04:36:04PM +0700, John Naylor wrote:

On Wed, Feb 5, 2025 at 4:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

[v2]

I started looking just at 0001, and it seems like a fairly
straightforward rearrangement.

Thanks for taking a look.

I found this comment quite hard to read:
+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the
I had to read it several times before I noticed the difference between
"* found_objs" and "*found_objs". Maybe some extra spacing and breaks
would help, or other reorganization.

Yeah, it's pretty atrocious. I think the main problem is that the
interface is just too complicated, so I'll take a step back and see if I
can make it more understandable to humans. In the meantime, here's an
attempt at adjusting the comment:

* found_objs is a double pointer to a fully qualified list of objects to
* process, as returned by a previous call to vacuum_one_database(). If
* *found_objs (the once-dereferenced double pointer) is NULL, it is set to the
* results of the catalog query discussed below. If found_objs (the double
* pointer) is NULL, the results of the catalog query are not returned.
*
* If *found_objs (the once-dereferenced double pointer) is NULL, this function
* performs a catalog query to retrieve the list of tables to process. When
* "objects" is NULL, all tables in the database are processed. Otherwise, the
* catalog query performs a lookup for the objects listed in "objects".

--
nathan

John Naylor

johncnaylorls@gmail.com

11 months ago

In reply to: Nathan Bossart (#6)

Re: vacuumdb changes for stats import/export

On Sat, Mar 1, 2025 at 3:42 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 27, 2025 at 04:36:04PM +0700, John Naylor wrote:

I had to read it several times before I noticed the difference between
"* found_objs" and "*found_objs". Maybe some extra spacing and breaks
would help, or other reorganization.

Yeah, it's pretty atrocious. I think the main problem is that the
interface is just too complicated, so I'll take a step back and see if I
can make it more understandable to humans.

The interface is awkward, but on the other hand only a small part has
to really know about it. It's worth trying to make it more readable if
you can.

In the meantime, here's an
attempt at adjusting the comment:

That's better, and if we end up with this interface, we'll want quotes
around the names so the eye can tell where the "*" belong.

--
John Naylor
Amazon Web Services

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: John Naylor (#7)

2 attachment(s)

Re: vacuumdb changes for stats import/export

On Mon, Mar 03, 2025 at 05:58:43PM +0700, John Naylor wrote:

On Sat, Mar 1, 2025 at 3:42 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 27, 2025 at 04:36:04PM +0700, John Naylor wrote:

I had to read it several times before I noticed the difference between
"* found_objs" and "*found_objs". Maybe some extra spacing and breaks
would help, or other reorganization.

Yeah, it's pretty atrocious. I think the main problem is that the
interface is just too complicated, so I'll take a step back and see if I
can make it more understandable to humans.

The interface is awkward, but on the other hand only a small part has
to really know about it. It's worth trying to make it more readable if
you can.

True. One small thing we could do is to require "found_objs" (the double
pointer) to always be non-NULL, but that just compels some callers to
provide otherwise-unused variables. I've left the interface alone for now.

In the meantime, here's an attempt at adjusting the comment:

That's better, and if we end up with this interface, we'll want quotes
around the names so the eye can tell where the "*" belong.

I did that in v3. I also tried to break up this comment into bullet points
for better separation and logical flow.

--
nathan

Attachments:

v3-0001-vacuumdb-Save-catalog-query-results-for-analyze-i.patchtext/plain; charset=us-asciiDownload

From ecbcdc52324cb6cb962158bd0cdeb7cb9e6a304e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 3 Mar 2025 09:43:38 -0600
Subject: [PATCH v3 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 335 ++++++++++++++++++++++---------------
 1 file changed, 199 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..52b91837492 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, &found_objs,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +420,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +468,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +510,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +630,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
 	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +932,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
@@ -791,110 +947,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -925,6 +984,10 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs;
+
+		found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -941,7 +1004,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, &found_objs[i],
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -955,7 +1018,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v3-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From 26c7d17c807e5c7b77621c0412d2d56ddbcc38aa Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 4 Feb 2025 15:07:54 -0600
Subject: [PATCH v3 2/2] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only and --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: TODO
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           | 16 +++++
 src/bin/scripts/t/102_vacuumdb_stages.pl | 60 ++++++++++++++++
 src/bin/scripts/vacuumdb.c               | 92 ++++++++++++++++++++++++
 src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++
 4 files changed, 195 insertions(+)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..5295a61f083 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> and <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 52b91837492..f84a521fbfd 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -615,6 +625,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -839,6 +856,7 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -922,6 +940,79 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -1244,6 +1335,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..ff8e04d3a03 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2820,6 +2820,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

John Naylor

johncnaylorls@gmail.com

10 months ago

In reply to: Nathan Bossart (#8)

Re: vacuumdb changes for stats import/export

On Mon, Mar 3, 2025 at 11:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Mar 03, 2025 at 05:58:43PM +0700, John Naylor wrote:
True. One small thing we could do is to require "found_objs" (the double
pointer) to always be non-NULL, but that just compels some callers to
provide otherwise-unused variables. I've left the interface alone for now.

One thing to consider is that a pointer named "dummy" is self-documenting.

That's better, and if we end up with this interface, we'll want quotes
around the names so the eye can tell where the "*" belong.

I did that in v3. I also tried to break up this comment into bullet points
for better separation and logical flow.

That's much easier to follow, thanks.

--
John Naylor
Amazon Web Services

#10

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: John Naylor (#9)

Re: vacuumdb changes for stats import/export

On Tue, Mar 04, 2025 at 01:05:17PM +0700, John Naylor wrote:

On Mon, Mar 3, 2025 at 11:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I did that in v3. I also tried to break up this comment into bullet points
for better separation and logical flow.

That's much easier to follow, thanks.

Thanks for looking. I'll probably commit 0001 sooner than later so that we
can focus our attention on 0002.

--
nathan

#11

John Naylor

johncnaylorls@gmail.com

10 months ago

In reply to: Nathan Bossart (#10)

Re: vacuumdb changes for stats import/export

On Wed, Mar 5, 2025 at 12:13 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Mar 04, 2025 at 01:05:17PM +0700, John Naylor wrote:

On Mon, Mar 3, 2025 at 11:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I did that in v3. I also tried to break up this comment into bullet points
for better separation and logical flow.

That's much easier to follow, thanks.

Thanks for looking. I'll probably commit 0001 sooner than later so that we
can focus our attention on 0002.

On 0002:

+        This option can only be used in conjunction with
+        <option>--analyze-only</option> and
<option>--analyze-in-stages</option>.

+ /* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+ if (vacopts.missing_only && !vacopts.analyze_only)
+ pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+ "missing-only", "analyze-only", "analyze-in-stages");

The first is slightly ambiguous, so maybe "or" is better throughout.

+ " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"

Looking elsewhere in this file, I think we prefer something like
"(c.relkind OPERATOR(pg_catalog.=) ANY (array["
CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_PARTITIONED_INDEX) "]) as p (inherited)\n"

+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+ " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+ " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");

IIUC correctly, pg_statistic doesn't store stats on itself, so this
causes the query result to always contain pg_statistic -- does that
get removed elsewhere?

--
John Naylor
Amazon Web Services

#12

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: John Naylor (#11)

2 attachment(s)

Re: vacuumdb changes for stats import/export

On Thu, Mar 06, 2025 at 06:30:59PM +0700, John Naylor wrote:

+        This option can only be used in conjunction with
+        <option>--analyze-only</option> and
<option>--analyze-in-stages</option>.

+ /* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+ if (vacopts.missing_only && !vacopts.analyze_only)
+ pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+ "missing-only", "analyze-only", "analyze-in-stages");

The first is slightly ambiguous, so maybe "or" is better throughout.

Agreed.

+ " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"

Looking elsewhere in this file, I think we prefer something like
"(c.relkind OPERATOR(pg_catalog.=) ANY (array["
CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_PARTITIONED_INDEX) "]) as p (inherited)\n"

Fixed.

+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+ " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+ " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
IIUC correctly, pg_statistic doesn't store stats on itself, so this
causes the query result to always contain pg_statistic -- does that
get removed elsewhere?

Good catch. I think the current behavior is to call ANALYZE on
pg_statistic, too, but that should be mostly harmless (analyze_rel()
refuses to process it). I suppose we could try to avoid returning
pg_statistic from the catalog query, but we don't bother doing that for any
other vacuumdb modes, so I'm tempted to leave it alone.

--
nathan

Attachments:

v4-0001-vacuumdb-Save-catalog-query-results-for-analyze-i.patchtext/plain; charset=us-asciiDownload

From e7e2c9750131458e00f735352f63d69da73aae8b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 3 Mar 2025 09:43:38 -0600
Subject: [PATCH v4 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 335 ++++++++++++++++++++++---------------
 1 file changed, 199 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..52b91837492 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, &found_objs,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +420,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +468,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +510,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +630,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
 	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +932,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
@@ -791,110 +947,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -925,6 +984,10 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs;
+
+		found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -941,7 +1004,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, &found_objs[i],
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -955,7 +1018,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v4-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From 3d01299daafb513fc0cd18d1e17ef9398306bdcb Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 6 Mar 2025 15:19:54 -0600
Subject: [PATCH v4 2/2] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only or --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           | 16 ++++
 src/bin/scripts/t/102_vacuumdb_stages.pl | 60 +++++++++++++++
 src/bin/scripts/vacuumdb.c               | 94 ++++++++++++++++++++++++
 src/test/perl/PostgreSQL/Test/Cluster.pm | 27 +++++++
 4 files changed, 197 insertions(+)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..91369bf1ffe 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> or <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 52b91837492..b279bcd6f3c 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -615,6 +625,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -839,6 +856,9 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -922,6 +942,79 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -1244,6 +1337,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..ff8e04d3a03 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2820,6 +2820,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

#13

John Naylor

johncnaylorls@gmail.com

10 months ago

In reply to: Nathan Bossart (#12)

Re: vacuumdb changes for stats import/export

On Fri, Mar 7, 2025 at 4:47 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Mar 06, 2025 at 06:30:59PM +0700, John Naylor wrote:

IIUC correctly, pg_statistic doesn't store stats on itself, so this
causes the query result to always contain pg_statistic -- does that
get removed elsewhere?

Good catch. I think the current behavior is to call ANALYZE on
pg_statistic, too, but that should be mostly harmless (analyze_rel()
refuses to process it). I suppose we could try to avoid returning
pg_statistic from the catalog query, but we don't bother doing that for any
other vacuumdb modes, so I'm tempted to leave it alone.

Okay, thanks for confirming. I have no further comments.

--
John Naylor
Amazon Web Services

#14

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: John Naylor (#13)

Re: vacuumdb changes for stats import/export

On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:

I have no further comments.

Thanks. I'll give this a little more time for review before committing.

We'll still need to update the recommendation in pg_upgrade's
documentation. I'm going to keep that separate because the stats
import/export work is still settling.

--
nathan

#15

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: Nathan Bossart (#14)

3 attachment(s)

Re: vacuumdb changes for stats import/export

On Mon, Mar 10, 2025 at 10:08:49AM -0500, Nathan Bossart wrote:

On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:

I have no further comments.

Thanks. I'll give this a little more time for review before committing.

I realized that we could limit the catalog query reuse to only when
--missing-only is specified, so I've updated 0001 and 0002 accordingly.
This avoids changing any existing behavior.

We'll still need to update the recommendation in pg_upgrade's
documentation. I'm going to keep that separate because the stats
import/export work is still settling.

0003 is a first attempt at this. Unless I am missing something, there's
really not much to update.

--
nathan

Attachments:

v5-0001-vacuumdb-Teach-vacuum_one_database-to-reuse-query.patchtext/plain; charset=us-asciiDownload

From 3840b2456e5485a591becc4cfe19dba4b1a54f09 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:18:36 -0500
Subject: [PATCH v5 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to store the
results of the catalog query for later use.  Note that this commit
does not make use of this new parameter for anything.  That is left
as a future exercise.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++++++++++++++++++++++---------------
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +509,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +629,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
 	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +931,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
@@ -791,110 +946,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -941,7 +999,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -955,7 +1013,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v5-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From 979d8e32494f565ce51fff8d8f1b1caf2a33cea0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:32:09 -0500
Subject: [PATCH v5 2/3] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only or --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           |  16 ++++
 src/bin/scripts/t/102_vacuumdb_stages.pl |  60 +++++++++++++
 src/bin/scripts/vacuumdb.c               | 106 ++++++++++++++++++++++-
 src/test/perl/PostgreSQL/Test/Cluster.pm |  27 ++++++
 4 files changed, 207 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..91369bf1ffe 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> or <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index e28f82a0eba..28fbc376ef4 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -406,12 +416,14 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects, NULL,
+									&objects,
+									vacopts.missing_only ? &found_objs : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -614,6 +626,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -838,6 +857,9 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -921,6 +943,79 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -983,6 +1078,11 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -999,7 +1099,8 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects, NULL,
+									objects,
+									vacopts->missing_only ? &found_objs[i] : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -1239,6 +1340,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..ff8e04d3a03 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2820,6 +2820,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

v5-0003-Update-guidance-for-running-vacuumdb-after-pg_upg.patchtext/plain; charset=us-asciiDownload

From b84664057699ac84e397c4237b54f91cd1bdd94e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:41:34 -0500
Subject: [PATCH v5 3/3] Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-only option to only
analyze relations that are missing statistics.

Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/pgupgrade.sgml | 2 +-
 src/bin/pg_upgrade/check.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 9ef7a84eed0..4bc589010c1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -807,7 +807,7 @@ psql --username=postgres --file=script.sql postgres
     </para>
 
     <para>
-     Using <command>vacuumdb --all --analyze-only</command> can efficiently
+     Using <command>vacuumdb --all --analyze-only --missing-only</command> can efficiently
      generate such statistics, and the use of <option>--jobs</option>
      can speed it up.  Option <option>--analyze-in-stages</option>
      can be used to generate minimal statistics quickly.
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 88db8869b6e..2022f53f110 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -781,7 +781,7 @@ output_completion_banner(char *deletion_script_file_name)
 	pg_log(PG_REPORT,
 		   "Some optimizer statistics may not have been transferred by pg_upgrade.\n"
 		   "Once you start the new server, consider running:\n"
-		   "    %s/vacuumdb %s--all --analyze-in-stages", new_cluster.bindir, user_specification.data);
+		   "    %s/vacuumdb %s--all --analyze-in-stages --missing-only", new_cluster.bindir, user_specification.data);
 
 	if (deletion_script_file_name)
 		pg_log(PG_REPORT,
-- 
2.39.5 (Apple Git-154)

#16

John Naylor

johncnaylorls@gmail.com

10 months ago

In reply to: Nathan Bossart (#15)

Re: vacuumdb changes for stats import/export

On Wed, Mar 12, 2025 at 12:00 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Mon, Mar 10, 2025 at 10:08:49AM -0500, Nathan Bossart wrote:

On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:

I have no further comments.

Thanks. I'll give this a little more time for review before committing.

I realized that we could limit the catalog query reuse to only when
--missing-only is specified, so I've updated 0001 and 0002 accordingly.
This avoids changing any existing behavior.

We'll still need to update the recommendation in pg_upgrade's
documentation. I'm going to keep that separate because the stats
import/export work is still settling.

0003 is a first attempt at this. Unless I am missing something, there's
really not much to update.

The change here seems fine. My only quibble is that this sentence now
seems out of place: "Option --analyze-in-stages can be used to
generate minimal statistics quickly." I'm thinking we should make a
clearer separation for with and without the --no-statistics option
specified.

--
John Naylor
Amazon Web Services

#17

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: John Naylor (#16)

3 attachment(s)

Re: vacuumdb changes for stats import/export

On Wed, Mar 12, 2025 at 01:56:04PM +0700, John Naylor wrote:

The change here seems fine. My only quibble is that this sentence now
seems out of place: "Option --analyze-in-stages can be used to
generate minimal statistics quickly." I'm thinking we should make a
clearer separation for with and without the --no-statistics option
specified.

I think the recommendation stays the same regardless of whether
--no-statistics was used. --missing-only should do the right think either
way. But I do agree that this paragraph feels a bit haphazard. I modified
it to call out the full command for both --analyze-only and
--analyze-in-stages, and I moved the note about --jobs to its own sentence
and made it clear that it is applicable to both of the suggested vacuumdb
commands.

--
nathan

Attachments:

v6-0001-vacuumdb-Teach-vacuum_one_database-to-reuse-query.patchtext/plain; charset=us-asciiDownload

From 57329fe78042b12331852c4b8a121e73020e6a90 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:18:36 -0500
Subject: [PATCH v6 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to store the
results of the catalog query for later use.  Note that this commit
does not make use of this new parameter for anything.  That is left
as a future exercise.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++++++++++++++++++++++---------------
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +509,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +629,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
 	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +931,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
@@ -791,110 +946,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -941,7 +999,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -955,7 +1013,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v6-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From d904ca86ab0f8b79bd76a27a1ca89108f7fb119c Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:32:09 -0500
Subject: [PATCH v6 2/3] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only or --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           |  16 ++++
 src/bin/scripts/t/102_vacuumdb_stages.pl |  60 +++++++++++++
 src/bin/scripts/vacuumdb.c               | 106 ++++++++++++++++++++++-
 src/test/perl/PostgreSQL/Test/Cluster.pm |  27 ++++++
 4 files changed, 207 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..91369bf1ffe 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> or <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index e28f82a0eba..28fbc376ef4 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -406,12 +416,14 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects, NULL,
+									&objects,
+									vacopts.missing_only ? &found_objs : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -614,6 +626,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -838,6 +857,9 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -921,6 +943,79 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_index i\n"
+							 " CROSS JOIN LATERAL pg_catalog.unnest(i.indkey) WITH ORDINALITY u (attnum, ord)\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indexprs IS NOT NULL\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) i.indexrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) u.ord\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -983,6 +1078,11 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -999,7 +1099,8 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects, NULL,
+									objects,
+									vacopts->missing_only ? &found_objs[i] : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -1239,6 +1340,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..ff8e04d3a03 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2820,6 +2820,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

v6-0003-Update-guidance-for-running-vacuumdb-after-pg_upg.patchtext/plain; charset=us-asciiDownload

From 0b8959996f39424fe63587db598f4d5ae7d94f45 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:41:34 -0500
Subject: [PATCH v6 3/3] Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-only option to only
analyze relations that are missing statistics.

Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/pgupgrade.sgml | 9 +++++----
 src/bin/pg_upgrade/check.c      | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 9ef7a84eed0..8e3b01f963d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -807,10 +807,11 @@ psql --username=postgres --file=script.sql postgres
     </para>
 
     <para>
-     Using <command>vacuumdb --all --analyze-only</command> can efficiently
-     generate such statistics, and the use of <option>--jobs</option>
-     can speed it up.  Option <option>--analyze-in-stages</option>
-     can be used to generate minimal statistics quickly.
+     Using <command>vacuumdb --all --analyze-only --missing-only</command> can
+     efficiently generate such statistics.  Alternatively,
+     <command>vacuumdb --all --analyze-in-stages --missing-only</command>
+     can be used to generate minimal statistics quickly.  For either command,
+     the use of <option>--jobs</option> can speed it up.
      If <varname>vacuum_cost_delay</varname> is set to a non-zero
      value, this can be overridden to speed up statistics generation
      using <envar>PGOPTIONS</envar>, e.g., <literal>PGOPTIONS='-c
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 88db8869b6e..2022f53f110 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -781,7 +781,7 @@ output_completion_banner(char *deletion_script_file_name)
 	pg_log(PG_REPORT,
 		   "Some optimizer statistics may not have been transferred by pg_upgrade.\n"
 		   "Once you start the new server, consider running:\n"
-		   "    %s/vacuumdb %s--all --analyze-in-stages", new_cluster.bindir, user_specification.data);
+		   "    %s/vacuumdb %s--all --analyze-in-stages --missing-only", new_cluster.bindir, user_specification.data);
 
 	if (deletion_script_file_name)
 		pg_log(PG_REPORT,
-- 
2.39.5 (Apple Git-154)

#18

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: Nathan Bossart (#17)

Re: vacuumdb changes for stats import/export

Out of curiosity, I generated many relations with the following command
(stolen from [0]/messages/by-id/3612876.1689443232@sss.pgh.pa.us):

do $$
begin
for i in 1..100000 loop
execute format('create table t%s (f1 int unique, f2 int unique);', i);
execute format('insert into t%s select x, x from generate_series(1,1000) x',
i);
if i % 100 = 0 then commit; end if;
end loop;
end
$$;

And then I ran a database-wide ANALYZE. Without --missing-only, vacuumdb's
catalog query took 65 ms. With --missing-only, it took 735 ms. While
that's a big jump, this query will only run once for a given vacuumdb, and
--missing-only is likely to save a lot of time elsewhere.

If no feedback or objections materialize, I'm planning to commit these
early next week.

[0]: /messages/by-id/3612876.1689443232@sss.pgh.pa.us

--
nathan

#19

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: Nathan Bossart (#18)

3 attachment(s)

Re: vacuumdb changes for stats import/export

On Fri, Mar 14, 2025 at 02:05:30PM -0500, Nathan Bossart wrote:

If no feedback or objections materialize, I'm planning to commit these
early next week.

While preparing this for commit, I noticed that the expression index part
of the query was disregarding attstattarget. To fix, I've modified that
part to look at the index's pg_attribute entries.

--
nathan

Attachments:

v7-0001-vacuumdb-Teach-vacuum_one_database-to-reuse-query.patchtext/plain; charset=us-asciiDownload

From b242f4376433c252291850f3b899f6cb0a6b24ad Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 17 Mar 2025 16:11:19 -0500
Subject: [PATCH v7 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to return the
results of the catalog query.  Note that this commit does not make
use of this new parameter for anything.  That is left for a
follow-up commit.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++++++++++++++++++++++---------------
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+										  vacuumingOptions *vacopts,
+										  SimpleStringList *objects,
+										  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,
 								vacuumingOptions *vacopts,
 								int stage,
 								SimpleStringList *objects,
+								SimpleStringList **found_objs,
 								int concurrentCons,
 								const char *progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects,
+									&objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
 		else
 			vacuum_one_database(&cparams, &vacopts,
 								ANALYZE_NO_STAGE,
-								&objects,
+								&objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 	}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *    of objects to process, as returned by a previous call to
+ *    vacuum_one_database().
+ *
+ *     a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *        once-dereferenced double pointer) are not NULL, this list takes
+ *        priority, and anything specified in "objects" is ignored.
+ *
+ *     b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *        (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *        parameter takes priority, and the results of the catalog query
+ *        described in (2) are stored in "found_objs".
+ *
+ *     c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *        parameter again takes priority, and the results of the catalog query
+ *        are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ *    When (1b) or (1c) applies, this function performs a catalog query to
+ *    retrieve a fully qualified list of objects to process, as described
+ *    below.
+ *
+ *     a) If "objects" is not NULL, the catalog query gathers only the objects
+ *        listed in "objects".
+ *
+ *     b) If "objects" is NULL, all tables in the database are gathered.
  *
  * Note that this function is only concerned with running exactly one stage
  * when in analyze-in-stages mode; caller must iterate on us if necessary.
@@ -475,22 +509,18 @@ vacuum_one_database(ConnParams *cparams,
 					vacuumingOptions *vacopts,
 					int stage,
 					SimpleStringList *objects,
+					SimpleStringList **found_objs,
 					int concurrentCons,
 					const char *progname, bool echo, bool quiet)
 {
 	PQExpBufferData sql;
-	PQExpBufferData buf;
-	PQExpBufferData catalog_query;
-	PGresult   *res;
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
 	ParallelSlotArray *sa;
-	SimpleStringList dbtables = {NULL, NULL};
-	int			i;
-	int			ntups;
+	int			ntups = 0;
 	bool		failed = false;
-	bool		objects_listed = false;
 	const char *initcmd;
+	SimpleStringList *ret = NULL;
 	const char *stage_commands[] = {
 		"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
 		"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -599,19 +629,155 @@ vacuum_one_database(ConnParams *cparams,
 	}
 
 	/*
-	 * Prepare the list of tables to process by querying the catalogs.
-	 *
-	 * Since we execute the constructed query with the default search_path
-	 * (which could be unsafe), everything in this query MUST be fully
-	 * qualified.
-	 *
-	 * First, build a WITH clause for the catalog query if any tables were
-	 * specified, with a set of values made of relation names and their
-	 * optional set of columns.  This is used to match any provided column
-	 * lists with the generated qualified identifiers and to filter for the
-	 * tables provided via --table.  If a listed table does not exist, the
-	 * catalog query will fail.
+	 * If the caller provided the results of a previous catalog query, just
+	 * use that.  Otherwise, run the catalog query ourselves and set the
+	 * return variable if provided.
+	 */
+	if (found_objs && *found_objs)
+		ret = *found_objs;
+	else
+	{
+		ret = retrieve_objects(conn, vacopts, objects, echo);
+		if (found_objs)
+			*found_objs = ret;
+	}
+
+	/*
+	 * Count the number of objects in the catalog query result.  If there are
+	 * none, we are done.
+	 */
+	for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+		ntups++;
+
+	if (ntups == 0)
+	{
+		PQfinish(conn);
+		return;
+	}
+
+	/*
+	 * Ensure concurrentCons is sane.  If there are more connections than
+	 * vacuumable relations, we don't need to use them all.
 	 */
+	if (concurrentCons > ntups)
+		concurrentCons = ntups;
+	if (concurrentCons <= 0)
+		concurrentCons = 1;
+
+	/*
+	 * All slots need to be prepared to run the appropriate analyze stage, if
+	 * caller requested that mode.  We have to prepare the initial connection
+	 * ourselves before setting up the slots.
+	 */
+	if (stage == ANALYZE_NO_STAGE)
+		initcmd = NULL;
+	else
+	{
+		initcmd = stage_commands[stage];
+		executeCommand(conn, initcmd, echo);
+	}
+
+	/*
+	 * Setup the database connections. We reuse the connection we already have
+	 * for the first slot.  If not in parallel mode, the first slot in the
+	 * array contains the connection.
+	 */
+	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+	ParallelSlotsAdoptConn(sa, conn);
+
+	initPQExpBuffer(&sql);
+
+	cell = ret->head;
+	do
+	{
+		const char *tabname = cell->val;
+		ParallelSlot *free_slot;
+
+		if (CancelRequested)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		free_slot = ParallelSlotsGetIdle(sa, NULL);
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+							   vacopts, tabname);
+
+		/*
+		 * Execute the vacuum.  All errors are handled in processQueryResult
+		 * through ParallelSlotsGetIdle.
+		 */
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, sql.data,
+						   echo, tabname);
+
+		cell = cell->next;
+	} while (cell != NULL);
+
+	if (!ParallelSlotsWaitCompletion(sa))
+	{
+		failed = true;
+		goto finish;
+	}
+
+	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
+	{
+		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+		if (!free_slot)
+		{
+			failed = true;
+			goto finish;
+		}
+
+		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+		if (!ParallelSlotsWaitCompletion(sa))
+			failed = true;
+	}
+
+finish:
+	ParallelSlotsTerminate(sa);
+	pg_free(sa);
+
+	termPQExpBuffer(&sql);
+
+	if (failed)
+		exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns.  This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table.  If a listed table does not exist, the catalog query will fail.
+ */
+static SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+				 SimpleStringList *objects, bool echo)
+{
+	PQExpBufferData buf;
+	PQExpBufferData catalog_query;
+	PGresult   *res;
+	SimpleStringListCell *cell;
+	SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+	bool		objects_listed = false;
+
 	initPQExpBuffer(&catalog_query);
 	for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
 	{
@@ -765,23 +931,12 @@ vacuum_one_database(ConnParams *cparams,
 	termPQExpBuffer(&catalog_query);
 	PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
 
-	/*
-	 * If no rows are returned, there are no matching tables, so we are done.
-	 */
-	ntups = PQntuples(res);
-	if (ntups == 0)
-	{
-		PQclear(res);
-		PQfinish(conn);
-		return;
-	}
-
 	/*
 	 * Build qualified identifiers for each table, including the column list
 	 * if given.
 	 */
 	initPQExpBuffer(&buf);
-	for (i = 0; i < ntups; i++)
+	for (int i = 0; i < PQntuples(res); i++)
 	{
 		appendPQExpBufferStr(&buf,
 							 fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
@@ -791,110 +946,13 @@ vacuum_one_database(ConnParams *cparams,
 		if (objects_listed && !PQgetisnull(res, i, 2))
 			appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
 
-		simple_string_list_append(&dbtables, buf.data);
+		simple_string_list_append(found_objs, buf.data);
 		resetPQExpBuffer(&buf);
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
 
-	/*
-	 * Ensure concurrentCons is sane.  If there are more connections than
-	 * vacuumable relations, we don't need to use them all.
-	 */
-	if (concurrentCons > ntups)
-		concurrentCons = ntups;
-	if (concurrentCons <= 0)
-		concurrentCons = 1;
-
-	/*
-	 * All slots need to be prepared to run the appropriate analyze stage, if
-	 * caller requested that mode.  We have to prepare the initial connection
-	 * ourselves before setting up the slots.
-	 */
-	if (stage == ANALYZE_NO_STAGE)
-		initcmd = NULL;
-	else
-	{
-		initcmd = stage_commands[stage];
-		executeCommand(conn, initcmd, echo);
-	}
-
-	/*
-	 * Setup the database connections. We reuse the connection we already have
-	 * for the first slot.  If not in parallel mode, the first slot in the
-	 * array contains the connection.
-	 */
-	sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
-	ParallelSlotsAdoptConn(sa, conn);
-
-	initPQExpBuffer(&sql);
-
-	cell = dbtables.head;
-	do
-	{
-		const char *tabname = cell->val;
-		ParallelSlot *free_slot;
-
-		if (CancelRequested)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
-							   vacopts, tabname);
-
-		/*
-		 * Execute the vacuum.  All errors are handled in processQueryResult
-		 * through ParallelSlotsGetIdle.
-		 */
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, sql.data,
-						   echo, tabname);
-
-		cell = cell->next;
-	} while (cell != NULL);
-
-	if (!ParallelSlotsWaitCompletion(sa))
-	{
-		failed = true;
-		goto finish;
-	}
-
-	/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
-	if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
-	{
-		const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
-		ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
-		if (!free_slot)
-		{
-			failed = true;
-			goto finish;
-		}
-
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
-		if (!ParallelSlotsWaitCompletion(sa))
-			failed = true;
-	}
-
-finish:
-	ParallelSlotsTerminate(sa);
-	pg_free(sa);
-
-	termPQExpBuffer(&sql);
-
-	if (failed)
-		exit(1);
+	return found_objs;
 }
 
 /*
@@ -941,7 +999,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects,
+									objects, NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -955,7 +1013,7 @@ vacuum_all_databases(ConnParams *cparams,
 
 			vacuum_one_database(cparams, vacopts,
 								ANALYZE_NO_STAGE,
-								objects,
+								objects, NULL,
 								concurrentCons,
 								progname, echo, quiet);
 		}
-- 
2.39.5 (Apple Git-154)

v7-0002-vacuumdb-Add-option-for-analyzing-only-relations-.patchtext/plain; charset=us-asciiDownload

From 6a2cf66ec12df1532aaa3b59ba4fc2d9a46a07cb Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 11 Mar 2025 11:32:09 -0500
Subject: [PATCH v7 2/3] vacuumdb: Add option for analyzing only relations
 missing stats.

This commit adds a new --missing-only option that can be used in
conjunction with --analyze-only or --analyze-in-stages.  When this
option is specified, vacuumdb will generate ANALYZE commands for a
relation if it is missing any statistics it should ordinarily have.
For example, if a table has statistics for one column but not
another, we will analyze the whole table.  A similar principle
applies to extended statistics, expression indexes, and table
inheritance.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/vacuumdb.sgml           |  16 ++++
 src/bin/scripts/t/102_vacuumdb_stages.pl |  60 +++++++++++++
 src/bin/scripts/vacuumdb.c               | 109 ++++++++++++++++++++++-
 src/test/perl/PostgreSQL/Test/Cluster.pm |  27 ++++++
 4 files changed, 210 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 66fccb30a2d..91369bf1ffe 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -277,6 +277,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--missing-only</option></term>
+      <listitem>
+       <para>
+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.
+       </para>
+       <para>
+        This option can only be used in conjunction with
+        <option>--analyze-only</option> or <option>--analyze-in-stages</option>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
       <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
diff --git a/src/bin/scripts/t/102_vacuumdb_stages.pl b/src/bin/scripts/t/102_vacuumdb_stages.pl
index 984c8d06de6..b216fb0c2c6 100644
--- a/src/bin/scripts/t/102_vacuumdb_stages.pl
+++ b/src/bin/scripts/t/102_vacuumdb_stages.pl
@@ -21,6 +21,66 @@ $node->issues_sql_like(
                    .*statement:\ ANALYZE/sx,
 	'analyze three times');
 
+$node->safe_psql('postgres',
+	'CREATE TABLE regression_vacuumdb_test AS select generate_series(1, 10) a, generate_series(2, 11) b;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing stats');
+
+$node->safe_psql('postgres',
+	'CREATE INDEX regression_vacuumdb_test_idx ON regression_vacuumdb_test (mod(a, 2));');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing index expression stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing index expression stats');
+
+$node->safe_psql('postgres',
+	'CREATE STATISTICS regression_vacuumdb_test_stat ON a, b FROM regression_vacuumdb_test;');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with missing extended stats');
+$node->issues_sql_unlike(
+	[ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+	qr/statement:\ ANALYZE/sx,
+	'--missing-only with no missing extended stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_child (a INT) INHERITS (regression_vacuumdb_test);\n"
+	. "INSERT INTO regression_vacuumdb_child VALUES (1, 2);\n"
+	. "ANALYZE regression_vacuumdb_child;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing inherited stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_test', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing inherited stats');
+
+$node->safe_psql('postgres',
+	"CREATE TABLE regression_vacuumdb_parted (a INT) PARTITION BY LIST (a);\n"
+	. "CREATE TABLE regression_vacuumdb_part1 PARTITION OF regression_vacuumdb_parted FOR VALUES IN (1);\n"
+	. "INSERT INTO regression_vacuumdb_parted VALUES (1);\n"
+	. "ANALYZE regression_vacuumdb_part1;\n");
+$node->issues_sql_like(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with missing partition stats');
+$node->issues_sql_unlike(
+    [ 'vacuumdb', '--analyze-in-stages', '--missing-only', '-t', 'regression_vacuumdb_parted', 'postgres' ],
+    qr/statement:\ ANALYZE/sx,
+    '--missing-only with no missing partition stats');
+
 $node->issues_sql_like(
 	[ 'vacuumdb', '--analyze-in-stages', '--all' ],
 	qr/statement:\ SET\ default_statistics_target=1;\ SET\ vacuum_cost_delay=0;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index e28f82a0eba..b7c7025cf70 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -47,6 +47,7 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 	bool		skip_database_stats;
 	char	   *buffer_usage_limit;
+	bool		missing_only;
 } vacuumingOptions;
 
 /* object filter options */
@@ -134,6 +135,7 @@ main(int argc, char *argv[])
 		{"no-process-toast", no_argument, NULL, 11},
 		{"no-process-main", no_argument, NULL, 12},
 		{"buffer-usage-limit", required_argument, NULL, 13},
+		{"missing-only", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -281,6 +283,9 @@ main(int argc, char *argv[])
 			case 13:
 				vacopts.buffer_usage_limit = escape_quotes(optarg);
 				break;
+			case 14:
+				vacopts.missing_only = true;
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -366,6 +371,11 @@ main(int argc, char *argv[])
 		pg_fatal("cannot use the \"%s\" option with the \"%s\" option",
 				 "buffer-usage-limit", "full");
 
+	/* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
+	if (vacopts.missing_only && !vacopts.analyze_only)
+		pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
+				 "missing-only", "analyze-only", "analyze-in-stages");
+
 	/* fill cparams except for dbname, which is set below */
 	cparams.pghost = host;
 	cparams.pgport = port;
@@ -406,12 +416,14 @@ main(int argc, char *argv[])
 		if (analyze_in_stages)
 		{
 			int			stage;
+			SimpleStringList *found_objs = NULL;
 
 			for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
 			{
 				vacuum_one_database(&cparams, &vacopts,
 									stage,
-									&objects, NULL,
+									&objects,
+									vacopts.missing_only ? &found_objs : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -614,6 +626,13 @@ vacuum_one_database(ConnParams *cparams,
 				 "--buffer-usage-limit", "16");
 	}
 
+	if (vacopts->missing_only && PQserverVersion(conn) < 150000)
+	{
+		PQfinish(conn);
+		pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+				 "--missing-only", "15");
+	}
+
 	/* skip_database_stats is used automatically if server supports it */
 	vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
 
@@ -838,6 +857,9 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						 " FROM pg_catalog.pg_class c\n"
 						 " JOIN pg_catalog.pg_namespace ns"
 						 " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+						 " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+						 CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+						 CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
 						 " LEFT JOIN pg_catalog.pg_class t"
 						 " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
 
@@ -921,6 +943,82 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 						  vacopts->min_mxid_age);
 	}
 
+	if (vacopts->missing_only)
+	{
+		appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+		/* regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* expression indexes */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " JOIN pg_index i ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::int2] OPERATOR(pg_catalog.=) 0::int2\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+		/* table inheritance and regular stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+							 " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+							 " AND NOT a.attisdropped\n"
+							 " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+							 " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+							 " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+							 " AND s.stainherit))\n");
+
+		/* table inheritance and extended stats */
+		appendPQExpBufferStr(&catalog_query,
+							 " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+							 " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+							 " AND c.reltuples OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
+							 " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+							 " AND c.relhassubclass\n"
+							 " AND NOT p.inherited\n"
+							 " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+							 " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+							 " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+							 " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+							 " AND d.stxdinherit))\n");
+
+		appendPQExpBufferStr(&catalog_query, " )\n");
+	}
+
 	/*
 	 * Execute the catalog query.  We use the default search_path for this
 	 * query for consistency with table lookups done elsewhere by the user.
@@ -983,6 +1081,11 @@ vacuum_all_databases(ConnParams *cparams,
 
 	if (analyze_in_stages)
 	{
+		SimpleStringList **found_objs = NULL;
+
+		if (vacopts->missing_only)
+			found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
 		/*
 		 * When analyzing all databases in stages, we analyze them all in the
 		 * fastest stage first, so that initial statistics become available
@@ -999,7 +1102,8 @@ vacuum_all_databases(ConnParams *cparams,
 
 				vacuum_one_database(cparams, vacopts,
 									stage,
-									objects, NULL,
+									objects,
+									vacopts->missing_only ? &found_objs[i] : NULL,
 									concurrentCons,
 									progname, echo, quiet);
 			}
@@ -1239,6 +1343,7 @@ help(const char *progname)
 	printf(_("  -j, --jobs=NUM                  use this many concurrent connections to vacuum\n"));
 	printf(_("      --min-mxid-age=MXID_AGE     minimum multixact ID age of tables to vacuum\n"));
 	printf(_("      --min-xid-age=XID_AGE       minimum transaction ID age of tables to vacuum\n"));
+	printf(_("      --missing-only              only analyze relations with missing statistics\n"));
 	printf(_("      --no-index-cleanup          don't remove index entries that point to dead tuples\n"));
 	printf(_("      --no-process-main           skip the main relation\n"));
 	printf(_("      --no-process-toast          skip the TOAST table associated with the table to vacuum\n"));
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bab3f3d2dbe..05bd94609d4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2849,6 +2849,33 @@ sub issues_sql_like
 
 =pod
 
+=item $node->issues_sql_unlike(cmd, unexpected_sql, test_name)
+
+Run a command on the node, then verify that $unexpected_sql does not appear in
+the server log file.
+
+=cut
+
+sub issues_sql_unlike
+{
+	local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+	my ($self, $cmd, $unexpected_sql, $test_name) = @_;
+
+	local %ENV = $self->_get_env();
+
+	my $log_location = -s $self->logfile;
+
+	my $result = PostgreSQL::Test::Utils::run_log($cmd);
+	ok($result, "@$cmd exit code 0");
+	my $log =
+	  PostgreSQL::Test::Utils::slurp_file($self->logfile, $log_location);
+	unlike($log, $unexpected_sql, "$test_name: SQL not found in server log");
+	return;
+}
+
+=pod
+
 =item $node->log_content()
 
 Returns the contents of log of the node
-- 
2.39.5 (Apple Git-154)

v7-0003-Update-guidance-for-running-vacuumdb-after-pg_upg.patchtext/plain; charset=us-asciiDownload

From ad222d6447106123924a0577ab9aeb80994603b0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 17 Mar 2025 20:07:43 -0500
Subject: [PATCH v7 3/3] Update guidance for running vacuumdb after pg_upgrade.

Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-only option to only
analyze relations that are missing statistics.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 doc/src/sgml/ref/pgupgrade.sgml | 9 +++++----
 src/bin/pg_upgrade/check.c      | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 9ef7a84eed0..8e3b01f963d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -807,10 +807,11 @@ psql --username=postgres --file=script.sql postgres
     </para>
 
     <para>
-     Using <command>vacuumdb --all --analyze-only</command> can efficiently
-     generate such statistics, and the use of <option>--jobs</option>
-     can speed it up.  Option <option>--analyze-in-stages</option>
-     can be used to generate minimal statistics quickly.
+     Using <command>vacuumdb --all --analyze-only --missing-only</command> can
+     efficiently generate such statistics.  Alternatively,
+     <command>vacuumdb --all --analyze-in-stages --missing-only</command>
+     can be used to generate minimal statistics quickly.  For either command,
+     the use of <option>--jobs</option> can speed it up.
      If <varname>vacuum_cost_delay</varname> is set to a non-zero
      value, this can be overridden to speed up statistics generation
      using <envar>PGOPTIONS</envar>, e.g., <literal>PGOPTIONS='-c
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index d32fc3d88ec..9f0f219e547 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -781,7 +781,7 @@ output_completion_banner(char *deletion_script_file_name)
 	pg_log(PG_REPORT,
 		   "Some optimizer statistics may not have been transferred by pg_upgrade.\n"
 		   "Once you start the new server, consider running:\n"
-		   "    %s/vacuumdb %s--all --analyze-in-stages", new_cluster.bindir, user_specification.data);
+		   "    %s/vacuumdb %s--all --analyze-in-stages --missing-only", new_cluster.bindir, user_specification.data);
 
 	if (deletion_script_file_name)
 		pg_log(PG_REPORT,
-- 
2.39.5 (Apple Git-154)

#20

Corey Huinker

corey.huinker@gmail.com

10 months ago

In reply to: Nathan Bossart (#19)

Re: vacuumdb changes for stats import/export

While preparing this for commit, I noticed that the expression index part
of the query was disregarding attstattarget. To fix, I've modified that
part to look at the index's pg_attribute entries.

+1, should have been there all along.

#21

Nathan Bossart

nathandbossart@gmail.com

10 months ago

In reply to: Corey Huinker (#20)

Re: vacuumdb changes for stats import/export

Committed with the following small changes:

* I renamed the new option to --missing-stats-only, which I felt was more
descriptive.

* I moved the new tests to the main vacuumdb test file and interspersed
some --analyze-only uses.

* I fixed a couple of other things in the new parts of the catalog query
that were not fully qualified.

--
nathan

#22

Frédéric Yhuel

frederic.yhuel@dalibo.com

6 months ago

In reply to: Nathan Bossart (#21)

Re: vacuumdb changes for stats import/export

On 3/18/25 22:37, Nathan Bossart wrote:

Committed with the following small changes:

Hi, I don't really understand this sentence in
doc/src/sgml/ref/vacuumdb.sgml:

This option prevents vacuumdb from deleting existing statistics so

that the query optimizer's choices do not become transiently worse.

I thought that the point was to avoid unnecessary post-upgrade analyzes?

#23

Bruce Momjian

bruce@momjian.us

6 months ago

In reply to: Frédéric Yhuel (#22)

Re: vacuumdb changes for stats import/export

On Wed, Jul 23, 2025 at 02:45:10PM +0200, Frédéric Yhuel wrote:

On 3/18/25 22:37, Nathan Bossart wrote:

Committed with the following small changes:

Hi, I don't really understand this sentence in
doc/src/sgml/ref/vacuumdb.sgml:

This option prevents vacuumdb from deleting existing statistics so that

the query optimizer's choices do not become transiently worse.

I thought that the point was to avoid unnecessary post-upgrade analyzes?

So, the full paragraph is:

+        Only analyze relations that are missing statistics for a column, index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing statistics
+        so that the query optimizer's choices do not become transiently worse.

What it is trying to say is that if you run vacuumedb without this
option, not only will it analyze all tables, including ones that already
have statistics, but will drop statistics on this tables that already
have statistics for a brief period while it installs new statistics.
During that period, the optimizer will not have any statistics for the
table. Is there a clearer way to state this?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#24

David G. Johnston

david.g.johnston@gmail.com

6 months ago

In reply to: Bruce Momjian (#23)

Re: vacuumdb changes for stats import/export

On Sat, Jul 26, 2025, 07:23 Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jul 23, 2025 at 02:45:10PM +0200, Frédéric Yhuel wrote:

On 3/18/25 22:37, Nathan Bossart wrote:

Committed with the following small changes:

Hi, I don't really understand this sentence in
doc/src/sgml/ref/vacuumdb.sgml:

This option prevents vacuumdb from deleting existing statistics so that

the query optimizer's choices do not become transiently worse.

I thought that the point was to avoid unnecessary post-upgrade analyzes?

So, the full paragraph is:
+        Only analyze relations that are missing statistics for a column,
index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing
statistics
+        so that the query optimizer's choices do not become transiently
worse.
What it is trying to say is that if you run vacuumedb without this
option, not only will it analyze all tables, including ones that already
have statistics, but will drop statistics on this tables that already
have statistics for a brief period while it installs new statistics.
During that period, the optimizer will not have any statistics for the
table. Is there a clearer way to state this?

Statistics are transactional. Without this option specified are we really
removing them and commiting prior to computing and saving new ones? And we
are opposed to just changing this behavior and instead prefer to add an
option that at first glance seems like everyone should use?

"If not specified the system will analyze all statistics-capable objects in
alphabetical order. Specifying this option then limits the result to only
those objects that do not already have statistics.". That may not be how
the feature strictly behaves but that would seem to be all one would expect
it to do.

David J.

#25

Bruce Momjian

bruce@momjian.us

6 months ago

In reply to: David G. Johnston (#24)

Re: vacuumdb changes for stats import/export

On Sat, Jul 26, 2025 at 07:47:47AM -0700, David G. Johnston wrote:

On Sat, Jul 26, 2025, 07:23 Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jul 23, 2025 at 02:45:10PM +0200, Frédéric Yhuel wrote:

On 3/18/25 22:37, Nathan Bossart wrote:

Committed with the following small changes:

Hi, I don't really understand this sentence in
doc/src/sgml/ref/vacuumdb.sgml:

This option prevents vacuumdb from deleting existing statistics so that

the query optimizer's choices do not become transiently worse.

I thought that the point was to avoid unnecessary post-upgrade analyzes?

So, the full paragraph is:
+        Only analyze relations that are missing statistics for a column,
index
+        expression, or extended statistics object.  This option prevents
+        <application>vacuumdb</application> from deleting existing
statistics
+        so that the query optimizer's choices do not become transiently
worse.
What it is trying to say is that if you run vacuumedb without this
option, not only will it analyze all tables, including ones that already
have statistics, but will drop statistics on this tables that already
have statistics for a brief period while it installs new statistics.
During that period, the optimizer will not have any statistics for the
table. Is there a clearer way to state this?

Statistics are transactional. Without this option specified are we really
removing them and commiting prior to computing and saving new ones? And we are
opposed to just changing this behavior and instead prefer to add an option that
at first glance seems like everyone should use?

Yes, I thought it was transactional too, but the doc patch suggests is
isn't, so maybe I am wrong.

"If not specified the system will analyze all statistics-capable objects in
alphabetical order. Specifying this option then limits the result to only
those objects that do not already have statistics.". That may not be how the
feature strictly behaves but that would seem to be all one would expect it to
do.

Yes, I would prefer the simpler text, if it is accurate.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#26

Corey Huinker

corey.huinker@gmail.com

6 months ago

In reply to: Bruce Momjian (#25)

Re: vacuumdb changes for stats import/export

Statistics are transactional. Without this option specified are we

really

removing them and commiting prior to computing and saving new ones? And

we are

opposed to just changing this behavior and instead prefer to add an

option that

at first glance seems like everyone should use?

Yes, I thought it was transactional too, but the doc patch suggests is
isn't, so maybe I am wrong.

It is transactional, mostly.

The attribute stats for the table being analyzed and all attribute stats
for the dependent indexes that have at least one expression column, plus
extended stats objects on that table, will be replaced in one atomic
operation. The old stats were there, commit happens, now the new stats are
there.

The relation stats (pg_class) happen in-place, non-transactionally. So if
you had an analyze that got canceled, and some of the relstats had been
updated on the table or indexes, those would stay as-is.

Having said all that, these are very nitpick details that may not need to
be in our documentation.

"If not specified the system will analyze all statistics-capable objects

in

alphabetical order. Specifying this option then limits the result to

only

those objects that do not already have statistics.". That may not be how

the

feature strictly behaves but that would seem to be all one would expect

it to

do.

/me dons Hat of Pedantry (+2 against simple explanations)

"Specifying this option then limits the result to only tables that are
missing an attribute statistic, have an index that is missing an attribute
statistic, or have an extended statistics object that is itself missing
object statistics".

/me removes hat

"Specifying this option then limits the result to only those tables that do
not already have complete statistics."

The above phrase is asking the word "complete" to do a lot of heavy
lifting, covering the dependent indexes and extended statistics objects.
Specifying "table" makes it clear that any deficit in statistics results in
the table getting analyzed, not just the single index or extended object.

#27

Frédéric Yhuel

frederic.yhuel@dalibo.com

6 months ago

In reply to: Corey Huinker (#26)

Re: vacuumdb changes for stats import/export

On 7/26/25 21:38, Corey Huinker wrote:

It is transactional, mostly.

The attribute stats for the table being analyzed and all attribute stats
for the dependent indexes that have at least one expression column, plus
extended stats objects on that table, will be replaced in one atomic
operation. The old stats were there, commit happens, now the new stats
are there.

The relation stats (pg_class) happen in-place, non-transactionally. So
if you had an analyze that got canceled, and some of the relstats had
been updated on the table or indexes, those would stay as-is.

Then it seems very unlikely that the query optimizer's choices become
transiently worse because of that, doesn't it? and this shouldn't be
used to justify this option, IMHO.

How about this? "This option is primarily intended to be used in
post-upgrade scripts to quickly generate minimal optimizer statistics"

Having said all that, these are very nitpick details that may not need
to be in our documentation.

I've always supposed that the statistics were completely transactional,
I think most users do, and the difference with reality seems small enough.

/me dons Hat of Pedantry (+2 against simple explanations)

"Specifying this option then limits the result to only tables that are
missing an attribute statistic, have an index that is missing an
attribute statistic, or have an extended statistics object that is
itself missing object statistics".

/me removes hat

"Specifying this option then limits the result to only those tables that
do not already have complete statistics."

The above phrase is asking the word "complete" to do a lot of heavy
lifting, covering the dependent indexes and extended statistics objects.
Specifying "table" makes it clear that any deficit in statistics results
in the table getting analyzed, not just the single index or extended object.

FWIW I much prefer the last, simple phrase.

#28

Nathan Bossart

nathandbossart@gmail.com

6 months ago

In reply to: Frédéric Yhuel (#27)

Re: vacuumdb changes for stats import/export

On Sun, Jul 27, 2025 at 12:46:44PM +0200, Frï¿½dï¿½ric Yhuel wrote:

Then it seems very unlikely that the query optimizer's choices become
transiently worse because of that, doesn't it? and this shouldn't be used to
justify this option, IMHO.

I can't remember who wrote this line, but it was borrowed from the
--analyze-in-stages description. The point is that if you use
--analyze-in-stages without --missing-stats-only, there will be a period
where existing statistics will be replaced with ones generated with lower
statistics targets. Obviously, this wording isn't clear enough. We might
need to either remove that sentence or add "When used in conjunction with
--analyze-in-stages..."

--
nathan

#29

Frédéric Yhuel

frederic.yhuel@dalibo.com

6 months ago

In reply to: Nathan Bossart (#28)

Re: vacuumdb changes for stats import/export

On 7/28/25 16:47, Nathan Bossart wrote:

I can't remember who wrote this line, but it was borrowed from the
--analyze-in-stages description. The point is that if you use
--analyze-in-stages without --missing-stats-only, there will be a period
where existing statistics will be replaced with ones generated with lower
statistics targets.

Aha, it makes sense now, thank you!

Obviously, this wording isn't clear enough. We might
need to either remove that sentence or add "When used in conjunction with
--analyze-in-stages..."

I vote for the second option.

#30

David G. Johnston

david.g.johnston@gmail.com

6 months ago

In reply to: Frédéric Yhuel (#29)

Re: vacuumdb changes for stats import/export

On Monday, July 28, 2025, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:

On 7/28/25 16:47, Nathan Bossart wrote:

I can't remember who wrote this line, but it was borrowed from the
--analyze-in-stages description. The point is that if you use
--analyze-in-stages without --missing-stats-only, there will be a period
where existing statistics will be replaced with ones generated with lower
statistics targets.

Aha, it makes sense now, thank you!

Obviously, this wording isn't clear enough. We might

need to either remove that sentence or add "When used in conjunction with
--analyze-in-stages..."

I vote for the second option.

Makes sense. This does beg the question - what happens if a column is left
with a lower statistics target than what would be applied during an
analyze, but one is present? I don’t see where the statistics target is
saved anywhere. Can we start recording that piece of data and teach
analyze in stages to just never go backwards - reporting any it had to skip
to adhere to that rule. Seems like a better policy than missing-only.

David J.

#31

Nathan Bossart

nathandbossart@gmail.com

6 months ago

In reply to: David G. Johnston (#30)

Re: vacuumdb changes for stats import/export

On Mon, Jul 28, 2025 at 09:22:29AM -0700, David G. Johnston wrote:

Makes sense. This does beg the question - what happens if a column is left
with a lower statistics target than what would be applied during an
analyze, but one is present? I donï¿½t see where the statistics target is
saved anywhere. Can we start recording that piece of data and teach
analyze in stages to just never go backwards - reporting any it had to skip
to adhere to that rule. Seems like a better policy than missing-only.

That would involve changing the behavior of an existing option, which has
thus far been strongly opposed [0]/messages/by-id/Z6KKHX9PZkB19lAK@nathan, for what strikes me as a reasonably
uncommon situation. Plus, --missing-stats-only may be useful in other
situations.

[0]: /messages/by-id/Z6KKHX9PZkB19lAK@nathan

--
nathan

#32

Nathan Bossart

nathandbossart@gmail.com

6 months ago

In reply to: David G. Johnston (#30)

Re: vacuumdb changes for stats import/export

On Mon, Jul 28, 2025 at 09:22:29AM -0700, David G. Johnston wrote:

On Monday, July 28, 2025, Frï¿½dï¿½ric Yhuel <frederic.yhuel@dalibo.com> wrote:

On 7/28/25 16:47, Nathan Bossart wrote:

Obviously, this wording isn't clear enough. We might need to either
remove that sentence or add "When used in conjunction with
--analyze-in-stages..."

I vote for the second option.

Makes sense.

It seems like there is some support for adding "When used in conjunction
with --analyze in stages..." to the beginning of the sentence. I'll give
it another day or two for any further discussion before committing.

--
nathan

#33

Nathan Bossart

nathandbossart@gmail.com

6 months ago

In reply to: Nathan Bossart (#32)

1 attachment(s)

Re: vacuumdb changes for stats import/export

On Mon, Jul 28, 2025 at 01:46:34PM -0500, Nathan Bossart wrote:

On Mon, Jul 28, 2025 at 09:22:29AM -0700, David G. Johnston wrote:

On Monday, July 28, 2025, Frï¿½dï¿½ric Yhuel <frederic.yhuel@dalibo.com> wrote:

On 7/28/25 16:47, Nathan Bossart wrote:

Obviously, this wording isn't clear enough. We might need to either
remove that sentence or add "When used in conjunction with
--analyze-in-stages..."

I vote for the second option.

Makes sense.

It seems like there is some support for adding "When used in conjunction
with --analyze in stages..." to the beginning of the sentence. I'll give
it another day or two for any further discussion before committing.

Here is what I have staged for commit.

--
nathan

Attachments:

v1-0001-doc-Adjust-documentation-for-vacuumdb-missing-sta.patchtext/plain; charset=iso-8859-1Download

From 82117acd6132351c368ec2670c4ba7f481d69ff9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 30 Jul 2025 12:13:30 -0500
Subject: [PATCH v1 1/1] doc: Adjust documentation for vacuumdb
 --missing-stats-only.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The sentence in question gave readers the impression that vacuumdb
deletes existing statistics for a period of time while analyzing,
but it's actually meant to convey that --analyze-in-stages will
temporarily replace existing statistics with ones generated with
lower statistics targets.

Reported-by: FrÃ©dÃ©ric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: FrÃ©dÃ©ric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/4b94ca16-7a6d-4581-b2aa-4ea79dbc082a%40dalibo.com
Backpatch-through: 18
---
 doc/src/sgml/ref/vacuumdb.sgml | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index b0680a61814..c7d9dca17b8 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -282,9 +282,11 @@ PostgreSQL documentation
       <listitem>
        <para>
         Only analyze relations that are missing statistics for a column, index
-        expression, or extended statistics object.  This option prevents
-        <application>vacuumdb</application> from deleting existing statistics
-        so that the query optimizer's choices do not become transiently worse.
+        expression, or extended statistics object.  When used with
+        <option>--analyze-in-stages</option>, this option prevents
+        <application>vacuumdb</application> from temporarily replacing existing
+        statistics with ones generated with lower statistics targets, thus
+        avoiding transiently worse query optimizer choices.
        </para>
        <para>
         This option can only be used in conjunction with
-- 
2.39.5 (Apple Git-154)

#34

Corey Huinker

corey.huinker@gmail.com

6 months ago

In reply to: Nathan Bossart (#33)

Re: vacuumdb changes for stats import/export

It seems like there is some support for adding "When used in conjunction
with --analyze in stages..." to the beginning of the sentence. I'll give
it another day or two for any further discussion before committing.

Here is what I have staged for commit.

#35