Optional skipping of unchanged relations during ANALYZE?

Started by VASUKI M4 days ago23 messages
#1VASUKI M
vasukianand0119@gmail.com

Hi Hackers,

I’m planning to work on a small improvement around ANALYZE behavior and
wanted to ask the community for guidance before proceeding.

Currently, when ANALYZE is run over many relations, it analyzes all
eligible tables even if some of them have not changed since their last
ANALYZE. In environments with many mostly-static tables, this can lead
to repeated work with little benefit.

I’m considering working on an optional mode where ANALYZE would skip
relations that have not been modified since their last analyze, based
on existing pg_stat counters (for example, mod_since_analyze = 0).

Before moving forward, I’d like to understand:

--whether this aligns with PostgreSQL’s statistics and planner design,

--if there are reasons ANALYZE should always re-run even for unchanged
relations,

--and whether such behavior would be acceptable if it were strictly
opt-in.

Any feedback, concerns, or pointers would be very helpful.

Thanks,
Vasuki M
C-DAC,Chennai

#2Christoph Berg
myon@debian.org
In reply to: VASUKI M (#1)
Re: Optional skipping of unchanged relations during ANALYZE?

Re: VASUKI M

I’m considering working on an optional mode where ANALYZE would skip
relations that have not been modified since their last analyze, based
on existing pg_stat counters (for example, mod_since_analyze = 0).

Make sure that doesn't skip tables that were never analyzed before.

Christoph

#3VASUKI M
vasukianand0119@gmail.com
In reply to: Christoph Berg (#2)
Re: Optional skipping of unchanged relations during ANALYZE?

Thanks for pointing that out.

On Tue, Jan 20, 2026 at 4:16 PM Christoph Berg <myon@debian.org> wrote:

Re: VASUKI M

I’m considering working on an optional mode where ANALYZE would skip
relations that have not been modified since their last analyze, based
on existing pg_stat counters (for example, mod_since_analyze = 0).

Make sure that doesn't skip tables that were never analyzed before.

Yes, the intention is that SMART ANALYZE would not skip relations that have
never been analyzed before.
The skip decision is based on pg_stat entries, so relations without
existing statistics will still be analyzed normally.

I’ll make sure this behavior is clear and covered when I post the patch.

Thanks,
Vasuki M
C-DAC,Chennai

#4Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: VASUKI M (#1)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi everyone,

On 20.01.2026 13:24, VASUKI M wrote:

I’m planning to work on a small improvement around ANALYZE behavior and
wanted to ask the community for guidance before proceeding.

Thanks for working on this — it indeed looks like it could reduce the
time spent executing ANALYZE.

Currently, when ANALYZE is run over many relations, it analyzes all
eligible tables even if some of them have not changed since their last
ANALYZE. In environments with many mostly-static tables, this can lead
to repeated work with little benefit.

I’m considering working on an optional mode where ANALYZE would skip
relations that have not been modified since their last analyze, based
on existing pg_stat counters (for example, mod_since_analyze = 0).

We should consider n_mod_since_analyze as well.

Before moving forward, I’d like to understand:

--whether this aligns with PostgreSQL’s statistics and planner design,

--if there are reasons ANALYZE should always re-run even for unchanged
relations,

--and whether such behavior would be acceptable if it were strictly
opt-in.

Any feedback, concerns, or pointers would be very helpful.

One concern that comes to mind is changes in statistics targets. For
example, statistics may have been collected with
default_statistics_target = 100, and later either
default_statistics_target or a per-column statistics target is increased
(e.g., to 200).

As far as I know, we currently do not track which statistics target was
used when the existing statistics were collected. If someone knows a
reliable way to determine this, please correct me.

If we cannot determine that, we would need to decide whether such
relations should still be skipped

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

#5Robert Treat
xzilla@users.sourceforge.net
In reply to: VASUKI M (#1)
Re: Optional skipping of unchanged relations during ANALYZE?

On Tue, Jan 20, 2026 at 5:24 AM VASUKI M <vasukianand0119@gmail.com> wrote:

Hi Hackers,

I’m planning to work on a small improvement around ANALYZE behavior and
wanted to ask the community for guidance before proceeding.

Currently, when ANALYZE is run over many relations, it analyzes all
eligible tables even if some of them have not changed since their last
ANALYZE. In environments with many mostly-static tables, this can lead
to repeated work with little benefit.

I’m considering working on an optional mode where ANALYZE would skip
relations that have not been modified since their last analyze, based
on existing pg_stat counters (for example, mod_since_analyze = 0).

Before moving forward, I’d like to understand:

--whether this aligns with PostgreSQL’s statistics and planner design,

I think it makes sense generally, and one could maybe argue that it
should be the default behavior; have you done any research into why it
doesn't behave that way already?

--if there are reasons ANALYZE should always re-run even for unchanged
relations,

Given ANALYZE does a random sample, on rare occasions it can be
valuable to re-run analyze to get a better sample than whatever
statistics were obtained previously, even in the case the data itself
does not change. I suppose more likely scenarios would be modification
of default_statistics_target either at server or table level
(adding/removing), but the point is there are scenarios where you
might want to rerun it, so we do need to support both behaviors.

--and whether such behavior would be acceptable if it were strictly
opt-in.

Given my above, it does have to be something that can be turned on or
off, so even if we don't know which is the best default behavior, it
makes sense to start by doing it in a way that is optional.

Robert Treat
https://xzilla.net

#6VASUKI M
vasukianand0119@gmail.com
In reply to: Robert Treat (#5)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi all,

Thanks a lot for the thoughtful feedback.

The points about statistics targets and random sampling make sense. In
particular, I agree that even without data changes, re-running ANALYZE
can still be beneficial (for example after increasing
default_statistics_target or per-column targets, or simply to obtain a
different sample).

Given that, my intention is to keep this strictly as an opt-in
behavior, so that existing semantics are unchanged unless the user
explicitly requests it. In the current prototype, tables that have
never been analyzed before are not skipped, and SMART only considers
relations that already have statistics.

Regarding statistics targets, since PostgreSQL does not currently track
which target was used to collect existing statistics, SMART ANALYZE
would not attempt to account for target changes. I plan to document
this limitation clearly so users understand the trade-off when opting
into this mode [for now later will look into it].

I’ll take this feedback into account while cleaning up the patch and
documentation, and will follow up with a v1 proposal once ready.

Thanks again for the guidance.

Regards,
Vasuki M
C-DAC,Chennai

#7David Rowley
dgrowleyml@gmail.com
In reply to: VASUKI M (#3)
Re: Optional skipping of unchanged relations during ANALYZE?

On Wed, 21 Jan 2026 at 00:02, VASUKI M <vasukianand0119@gmail.com> wrote:

On Tue, Jan 20, 2026 at 4:16 PM Christoph Berg <myon@debian.org> wrote:

Make sure that doesn't skip tables that were never analyzed before.

Yes, the intention is that SMART ANALYZE would not skip relations that have never been analyzed before.
The skip decision is based on pg_stat entries, so relations without existing statistics will still be analyzed normally.

If doing this, you would also need to make special consideration for
partitioned tables, as n_mod_since_analyze won't change for those
directly, but it might have changed for their partitions.

David

#8VASUKI M
vasukianand0119@gmail.com
In reply to: David Rowley (#7)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi David,

Thanks for calling this out — yes, I agree this is an important case.

Partitioned tables are already something I’m considering separately, since
the parent’s n_mod_since_analyze does not reflect changes made in the
partitions. The intention is not to skip analysis of partitions just because
the partitioned parent itself shows no modifications.

For now, my approach is deliberately limited to using the statistics that
are
already available via pg_stat and making skip decisions only where those
statistics are meaningful and reliable.

That also means that for the initial version, I’m not trying to introduce
special handling for cases like foreign tables or system catalogs beyond
what
existing statistics already provide. Where statistics are missing, unclear,
or potentially misleading, the conservative behavior would be to fall back
to running ANALYZE as usual.

Thanks again for the feedback.

Regards,
Vasuki M
C-DAC,Chennai.

On Wed, Jan 21, 2026 at 12:06 PM David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Wed, 21 Jan 2026 at 00:02, VASUKI M <vasukianand0119@gmail.com> wrote:

On Tue, Jan 20, 2026 at 4:16 PM Christoph Berg <myon@debian.org> wrote:

Make sure that doesn't skip tables that were never analyzed before.

Yes, the intention is that SMART ANALYZE would not skip relations that

have never been analyzed before.

The skip decision is based on pg_stat entries, so relations without

existing statistics will still be analyzed normally.

If doing this, you would also need to make special consideration for
partitioned tables, as n_mod_since_analyze won't change for those
directly, but it might have changed for their partitions.

David

#9Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: VASUKI M (#6)
Re: Optional skipping of unchanged relations during ANALYZE?

Another concern with skipping ANALYZE on unchanged tables is extended
statistics.

If CREATE/ALTER STATISTICS is executed, it would still be desirable for
ANALYZE to collect the newly statistics and extended ones, even if the
table data itself has not changed.

What do you think about this?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

#10VASUKI M
vasukianand0119@gmail.com
In reply to: Ilia Evdokimov (#9)
1 attachment(s)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi Hackers,

As discussed in the recent thread, I’m sharing an initial v1 patch
introducing an opt-in SMART mode for ANALYZE.

When ANALYZE (SMART) is specified, relations that have not been modified
since their last analyze are skipped, based on existing pg_stat counters
(n_mod_since_analyze = 0). Relations without existing statistics are
still analyzed normally. The default ANALYZE behavior remains unchanged.

The primary goal of this patch is to reduce unnecessary work when
running ANALYZE over many mostly-static tables, while keeping the
behavior strictly opt-in.

Scope of this v1 patch:
- Uses existing pg_stat statistics only
- Does not skip relations that were never analyzed before
- Includes regression tests demonstrating that only modified tables are
re-analyzed
- Partitioned tables, inheritance, foreign tables,extended statistics and
other edge cases are intentionally not handled yet; I plan to look into
those in follow-up work based on feedback

Example usage / how to observe behavior:

SET client_min_messages = debug1;

ANALYZE (SMART);
ANALYZE (SMART, VERBOSE);

ANALYZE (SMART) table1;
ANALYZE (SMART) table1, table2;
VACUUM(SMART);

Thanks for your time and review.

Best regards,
Vasuki M
C-DAC,Chennai

Attachments:

v1-0001-ANALYZE-add-optional-smart-mode-to-skip-unchanged-relations.patchtext/x-patch; charset=US-ASCII; name=v1-0001-ANALYZE-add-optional-smart-mode-to-skip-unchanged-relations.patchDownload
From 6fa990921c3a4e956bbbbaf61563ef639c21b240 Mon Sep 17 00:00:00 2001
From: Vasuki M <vasukianand0119@gmail.com>
Date: Wed, 21 Jan 2026 14:39:43 +0530
Subject: [PATCH v1] ANALYZE: add optional SMART mode to skip unchanged
 relations

Introduce an opt-in SMART option for ANALYZE that skips relations which
have not been modified since their last analyze, based on pg_stat
counters (n_mod_since_analyze = 0).

When SMART is specified, relations with no recorded modifications since
the previous ANALYZE are skipped, while relations without existing
statistics are still analyzed normally. The default ANALYZE behavior is
unchanged.

This can reduce unnecessary work when analyzing databases with many
mostly-static tables.

Regression tests are included.
---
 src/backend/commands/analyze.c              | 25 ++++++++++
 src/backend/commands/vacuum.c               | 15 +++++-
 src/include/commands/vacuum.h               |  2 +-
 src/test/regress/expected/analyze_smart.out | 51 +++++++++++++++++++++
 src/test/regress/parallel_schedule          |  1 +
 src/test/regress/sql/analyze_smart.sql      | 38 +++++++++++++++
 6 files changed, 130 insertions(+), 2 deletions(-)
 create mode 100644 src/test/regress/expected/analyze_smart.out
 create mode 100644 src/test/regress/sql/analyze_smart.sql

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a4834241..a4d445d9 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -45,6 +45,7 @@
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
 #include "utils/attoptcache.h"
+#include "utils/relcache.h"
 #include "utils/datum.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
@@ -140,6 +141,26 @@ analyze_rel(Oid relid, RangeVar *relation,
 	onerel = vacuum_open_relation(relid, relation, params.options & ~(VACOPT_VACUUM),
 								  params.log_analyze_min_duration >= 0,
 								  ShareUpdateExclusiveLock);
+	/* SMART ANALYZE: skip unchanged relations */
+	if ((params.options & VACOPT_SMART_ANALYZE) &&
+		onerel->rd_rel->relkind == RELKIND_RELATION)
+	{
+		PgStat_StatTabEntry *tabstat;
+
+		tabstat = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
+
+		if (tabstat && tabstat->mod_since_analyze == 0)
+		{
+
+			elog(DEBUG1,
+				"SMART ANALYZE: skipping relation \"%s\" (OID %u), no modifications since last analyze",
+				RelationGetRelationName(onerel),
+				RelationGetRelid(onerel));
+
+			table_close(onerel, ShareUpdateExclusiveLock);
+			return;
+		}
+	}
 
 	/* leave if relation could not be opened or locked */
 	if (!onerel)
@@ -314,6 +335,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+		RelationGetRelationName(onerel),
+		RelationGetRelid(onerel));
+
 	verbose = (params.options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
 							  params.log_analyze_min_duration >= 0));
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index aa4fbec1..8fd7016f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -165,6 +165,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	BufferAccessStrategy bstrategy = NULL;
 	bool		verbose = false;
 	bool		skip_locked = false;
+	bool            smart = false;
 	bool		analyze = false;
 	bool		freeze = false;
 	bool		full = false;
@@ -229,6 +230,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 
 			ring_size = result;
 		}
+		else if (strcmp(opt->defname, "smart") == 0)
+			smart = defGetBoolean(opt);
+
 		else if (!vacstmt->is_vacuumcmd)
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -306,6 +310,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(verbose ? VACOPT_VERBOSE : 0) |
 		(skip_locked ? VACOPT_SKIP_LOCKED : 0) |
 		(analyze ? VACOPT_ANALYZE : 0) |
+		(smart ? VACOPT_SMART_ANALYZE : 0) |
 		(freeze ? VACOPT_FREEZE : 0) |
 		(full ? VACOPT_FULL : 0) |
 		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) |
@@ -315,7 +320,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
 
 	/* sanity checks on options */
-	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE | VACOPT_SMART_ANALYZE));
 	Assert((params.options & VACOPT_VACUUM) ||
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 
@@ -351,6 +356,14 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
+	/*
+	 * SMART is only meaningful with ANALYZE.
+	 */
+	if ((params.options & VACOPT_SMART_ANALYZE) &&
+		!(params.options & VACOPT_ANALYZE))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("SMART option requires ANALYZE")));
 
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index e885a4b9..08533ec7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -188,7 +188,7 @@ typedef struct VacAttrStats
 #define VACOPT_DISABLE_PAGE_SKIPPING 0x100	/* don't skip any pages */
 #define VACOPT_SKIP_DATABASE_STATS 0x200	/* skip vac_update_datfrozenxid() */
 #define VACOPT_ONLY_DATABASE_STATS 0x400	/* only vac_update_datfrozenxid() */
-
+#define VACOPT_SMART_ANALYZE   0x00010000  /* skip unchanged relations during ANALYZE */
 /*
  * Values used by index_cleanup and truncate params.
  *
diff --git a/src/test/regress/expected/analyze_smart.out b/src/test/regress/expected/analyze_smart.out
new file mode 100644
index 00000000..2ccec3b2
--- /dev/null
+++ b/src/test/regress/expected/analyze_smart.out
@@ -0,0 +1,51 @@
+--
+-- SMART ANALYZE regression test
+--
+CREATE TABLE sa1 (id int);
+CREATE TABLE sa2 (id int);
+-- Initial analyze so stats exist
+ANALYZE;
+-- Modify only sa1
+INSERT INTO sa1 VALUES (1);
+-- Make sure stats snapshot is fresh
+SELECT pg_stat_clear_snapshot();
+ pg_stat_clear_snapshot 
+------------------------
+ 
+(1 row)
+
+-- Check modifications
+SELECT relname, n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+ relname | n_mod_since_analyze 
+---------+---------------------
+ sa1     |                   0
+ sa2     |                   0
+(2 rows)
+
+-- Run SMART ANALYZE on both tables
+ANALYZE (SMART) sa1, sa2;
+-- Refresh stats again
+SELECT pg_stat_clear_snapshot();
+ pg_stat_clear_snapshot 
+------------------------
+ 
+(1 row)
+
+-- Verify only sa1 was analyzed
+SELECT
+    relname,
+    n_mod_since_analyze = 0 AS reset_after_smart_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+ relname | reset_after_smart_analyze 
+---------+---------------------------
+ sa1     | t
+ sa2     | t
+(2 rows)
+
+DROP TABLE sa1;
+DROP TABLE sa2;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 905f9bca..c77194dd 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -90,6 +90,7 @@ test: rules psql psql_crosstab psql_pipeline amutils stats_ext collate.linux.utf
 test: select_parallel
 test: write_parallel
 test: vacuum_parallel
+test: analyze_smart
 
 # Run this alone, because concurrent DROP TABLE would make non-superuser
 # "ANALYZE;" fail with "relation with OID $n does not exist".
diff --git a/src/test/regress/sql/analyze_smart.sql b/src/test/regress/sql/analyze_smart.sql
new file mode 100644
index 00000000..993c035b
--- /dev/null
+++ b/src/test/regress/sql/analyze_smart.sql
@@ -0,0 +1,38 @@
+--
+-- SMART ANALYZE regression test
+--
+
+CREATE TABLE sa1 (id int);
+CREATE TABLE sa2 (id int);
+
+-- Initial analyze so stats exist
+ANALYZE;
+
+-- Modify only sa1
+INSERT INTO sa1 VALUES (1);
+
+-- Make sure stats snapshot is fresh
+SELECT pg_stat_clear_snapshot();
+
+-- Check modifications
+SELECT relname, n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+
+-- Run SMART ANALYZE on both tables
+ANALYZE (SMART) sa1, sa2;
+
+-- Refresh stats again
+SELECT pg_stat_clear_snapshot();
+
+-- Verify only sa1 was analyzed
+SELECT
+    relname,
+    n_mod_since_analyze = 0 AS reset_after_smart_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+
+DROP TABLE sa1;
+DROP TABLE sa2;
-- 
2.43.0

#11Christoph Berg
myon@debian.org
In reply to: VASUKI M (#10)
Re: Optional skipping of unchanged relations during ANALYZE?

Re: VASUKI M

VACUUM(SMART);

IMHO it was a historical mistake to combine VACUUM and ANALYZE into a
single command. We should not add any more options on that
combination. If people want to pass options to ANALYZE, they should
call ANALYZE and not VACUUM.

Christoph

#12VASUKI M
vasukianand0119@gmail.com
In reply to: Christoph Berg (#11)
Re: Optional skipping of unchanged relations during ANALYZE?

Sorry,forgot to mention that it will produce error as 'SMART requires
ANALYZE'

It's just a checking command :)

-Vasu

On Wed, Jan 21, 2026 at 3:14 PM Christoph Berg <myon@debian.org> wrote:

Show quoted text

Re: VASUKI M

VACUUM(SMART);

IMHO it was a historical mistake to combine VACUUM and ANALYZE into a
single command. We should not add any more options on that
combination. If people want to pass options to ANALYZE, they should
call ANALYZE and not VACUUM.

Christoph

#13Christoph Berg
myon@debian.org
In reply to: VASUKI M (#12)
Re: Optional skipping of unchanged relations during ANALYZE?

Re: VASUKI M

Sorry,forgot to mention that it will produce error as 'SMART requires
ANALYZE'

SMART is also a terribly non-descriptive name. How about CHANGED_ONLY?

Christoph

#14VASUKI M
vasukianand0119@gmail.com
In reply to: Christoph Berg (#13)
Re: Optional skipping of unchanged relations during ANALYZE?

On Wed, Jan 21, 2026 at 3:21 PM Christoph Berg <myon@debian.org> wrote:

SMART is also a terribly non-descriptive name. How about CHANGED_ONLY?

Yeah i agree,as of now i am focusing on concept workflow will change name
in next versions of patch.

Regards,
Vasuki M
C-DAC,Chennai.

#15Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: VASUKI M (#14)
Re: Optional skipping of unchanged relations during ANALYZE?

On 21.01.2026 12:56, VASUKI M wrote:

On Wed, Jan 21, 2026 at 3:21 PM Christoph Berg <myon@debian.org> wrote:

SMART is also a terribly non-descriptive name. How about CHANGED_ONLY?

 Yeah i agree,as of now i am focusing on concept workflow will change
name in next versions of patch.

Regards,
Vasuki M
C-DAC,Chennai.

So do I

It seems to me that the condition for relations that have never had
statistics collected might be incorrect. If I'm reading this correctly,
shouldn't this be checking 'tabstat->mod_since_analyze > 0' instead of
'tabstat->mod_since_analyze == 0'? I tested it on simple query:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM generate_series(1, 1000000) i;
ANALYZE (SMART) t;
SELECT COUNT(*) FROM pg_stats WHERE tablename = 't';
 count
-------
     0
(1 row)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

#16VASUKI M
vasukianand0119@gmail.com
In reply to: Ilia Evdokimov (#15)
1 attachment(s)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi llia,

On Wed, Jan 21, 2026 at 4:19 PM Ilia Evdokimov <
ilya.evdokimov@tantorlabs.com> wrote:

On 21.01.2026 12:56, VASUKI M wrote:

On Wed, Jan 21, 2026 at 3:21 PM Christoph Berg <myon@debian.org> wrote:

SMART is also a terribly non-descriptive name. How about CHANGED_ONLY?

Yeah i agree,as of now i am focusing on concept workflow will change name
in next versions of patch.

Regards,
Vasuki M
C-DAC,Chennai.

So do I

It seems to me that the condition for relations that have never had
statistics collected might be incorrect. If I'm reading this correctly,
shouldn't this be checking 'tabstat->mod_since_analyze > 0' instead of
'tabstat->mod_since_analyze == 0'? I tested it on simple query:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM generate_series(1, 1000000) i;
ANALYZE (SMART) t;
SELECT COUNT(*) FROM pg_stats WHERE tablename = 't';
count
-------
0
(1 row)

This passes now :)

As discussed in the recent thread, I am sharing a revised v2 patch that
introduces an optional SMART mode for ANALYZE.

When ANALYZE (SMART) is specified, relations are skipped if:
- they have been analyzed before (either manually or via autovacuum),
and
- they have not been modified since their last analyze
(n_mod_since_analyze = 0, based on pg_stat statistics).

Relations that have never been analyzed before are always analyzed
normally. The default ANALYZE behavior remains unchanged unless SMART
is explicitly requested.

The motivation is to reduce unnecessary ANALYZE work in databases with
a large number of mostly-static tables, while keeping the behavior
strictly opt-in.

Changes and clarifications in v2:
- Tables that have never been analyzed are never skipped
(checked via last_analyze_time / last_autoanalyze_time)
- Skip decisions rely only on pg_stat_user_tables counters
- The skip condition is n_mod_since_analyze == 0
- Regression tests are added to demonstrate:
-->SMART ANALYZE does not skip never-analyzed tables
-->Only modified tables are re-analyzed

This patch intentionally limits its scope to regular relations and
existing pg_stat statistics only. Partitioned tables, inheritance,
foreign tables, extended statistics, and statistics target changes are
not handled yet and can be considered in follow-up work based on
feedback.

The patch applies cleanly on current master and passes:
make distclean
./configure
make -j$(nproc)
make install
make check

See this:

analyze_test=# create table sa6 (id int);
CREATE TABLE
Time: 3.917 ms
analyze_test=# analyze(smart) sa6;
DEBUG: ANALYZE processing relation "sa6" (OID 131324)
ANALYZE
Time: 0.585 ms
analyze_test=# SELECT count(*) > 0 AS stats_created
FROM pg_stats
WHERE tablename = 'sa6';
stats_created
---------------
f
(1 row)

Time: 0.894 ms
analyze_test=# SELECT relname,
last_analyze,
n_mod_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'sa6';
relname | last_analyze | n_mod_since_analyze
---------+----------------------------------+---------------------
sa6 | 2026-01-22 10:35:23.005045+05:30 | 0
(1 row)

The empty table doesn't have any stats to show as pg_stat is column level
statistics;
these are created when rows exists ,it has 0 rows to make samples,most
common used values,etc,..so no data distribution

But when value is inserted ,

analyze_test=# CREATE TABLE sa4 (i int);
CREATE TABLE
Time: 10.290 ms
analyze_test=# INSERT INTO sa4 SELECT generate_series(1,10);
INSERT 0 10
Time: 45.373 ms
analyze_test=# analyze(smart) sa4;
DEBUG: ANALYZE processing relation "sa4" (OID 131310)
ANALYZE
Time: 47.771 ms
analyze_test=# SELECT count(*) > 0 AS stats_created
FROM pg_stats
WHERE tablename = 'sa4';
stats_created
---------------
t
(1 row)

Time: 0.945 ms

I would appreciate feedback on the overall approach.

Thanks for your time and review.

--
Best regards,
Vasuki M
C-DAC,Chennai

Attachments:

v2-0001-ANALYZE-Introduce-an-opt-in-SMART-option.patchtext/x-patch; charset=US-ASCII; name=v2-0001-ANALYZE-Introduce-an-opt-in-SMART-option.patchDownload
From 2f3be0eb8754ad1b684d27625bd59183a5511832 Mon Sep 17 00:00:00 2001
From: Vasuki M <vasukianand0119@gmail.com>
Date: Thu, 22 Jan 2026 11:33:10 +0530
Subject: [PATCH] Introduce an opt-in SMART option for ANALYZE that skips
 relations which have not been modified since their last analyze, based on
 pg_stat counters.

A relation is skipped only if:
- it has been analyzed before (manual or auto-analyze), and
- n_mod_since_analyze == 0

Relations that have never been analyzed are always analyzed normally.
The default ANALYZE behavior is unchanged unless SMART is explicitly
specified.

This can reduce unnecessary ANALYZE work in databases with many
mostly-static tables.

Regression tests are included to verify that:
- SMART ANALYZE does not skip never-analyzed tables
- only modified tables are re-analyzed when SMART is used
---
 src/backend/commands/analyze.c              | 30 +++++++++
 src/backend/commands/vacuum.c               | 15 ++++-
 src/include/commands/vacuum.h               |  2 +-
 src/test/regress/expected/analyze_smart.out | 71 +++++++++++++++++++++
 src/test/regress/parallel_schedule          |  1 +
 src/test/regress/sql/analyze_smart.sql      | 58 +++++++++++++++++
 6 files changed, 175 insertions(+), 2 deletions(-)
 create mode 100644 src/test/regress/expected/analyze_smart.out
 create mode 100644 src/test/regress/sql/analyze_smart.sql

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a4834241..536c6209 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -45,6 +45,7 @@
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
 #include "utils/attoptcache.h"
+#include "utils/relcache.h"
 #include "utils/datum.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
@@ -140,6 +141,31 @@ analyze_rel(Oid relid, RangeVar *relation,
 	onerel = vacuum_open_relation(relid, relation, params.options & ~(VACOPT_VACUUM),
 								  params.log_analyze_min_duration >= 0,
 								  ShareUpdateExclusiveLock);
+	/* SMART ANALYZE: skip unchanged relations that have been analyzed before and
+	 * have not changed since the last analyze.
+	 */
+	if ((params.options & VACOPT_SMART_ANALYZE) &&
+		onerel->rd_rel->relkind == RELKIND_RELATION)
+	{
+		PgStat_StatTabEntry *tabstat;
+
+		tabstat = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
+
+		if (tabstat != NULL &&
+			(tabstat->last_analyze_time != 0 ||
+			tabstat->last_autoanalyze_time != 0) &&
+			tabstat->mod_since_analyze == 0)
+		{
+
+			elog(DEBUG1,
+				"SMART ANALYZE: skipping relation \"%s\" (OID %u), no modifications since last analyze",
+				RelationGetRelationName(onerel),
+				RelationGetRelid(onerel));
+
+			table_close(onerel, ShareUpdateExclusiveLock);
+			return;
+		}
+	}
 
 	/* leave if relation could not be opened or locked */
 	if (!onerel)
@@ -314,6 +340,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 
+	elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+		RelationGetRelationName(onerel),
+		RelationGetRelid(onerel));
+
 	verbose = (params.options & VACOPT_VERBOSE) != 0;
 	instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
 							  params.log_analyze_min_duration >= 0));
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 03932f45..3e838476 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -165,6 +165,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	BufferAccessStrategy bstrategy = NULL;
 	bool		verbose = false;
 	bool		skip_locked = false;
+	bool            smart = false;
 	bool		analyze = false;
 	bool		freeze = false;
 	bool		full = false;
@@ -229,6 +230,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 
 			ring_size = result;
 		}
+		else if (strcmp(opt->defname, "smart") == 0)
+			smart = defGetBoolean(opt);
+
 		else if (!vacstmt->is_vacuumcmd)
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -306,6 +310,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(verbose ? VACOPT_VERBOSE : 0) |
 		(skip_locked ? VACOPT_SKIP_LOCKED : 0) |
 		(analyze ? VACOPT_ANALYZE : 0) |
+		(smart ? VACOPT_SMART_ANALYZE : 0) |
 		(freeze ? VACOPT_FREEZE : 0) |
 		(full ? VACOPT_FULL : 0) |
 		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) |
@@ -315,7 +320,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
 
 	/* sanity checks on options */
-	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE | VACOPT_SMART_ANALYZE));
 	Assert((params.options & VACOPT_VACUUM) ||
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 
@@ -351,6 +356,14 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
+	/*
+	 * SMART is only meaningful with ANALYZE.
+	 */
+	if ((params.options & VACOPT_SMART_ANALYZE) &&
+		!(params.options & VACOPT_ANALYZE))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("SMART option requires ANALYZE")));
 
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index e885a4b9..08533ec7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -188,7 +188,7 @@ typedef struct VacAttrStats
 #define VACOPT_DISABLE_PAGE_SKIPPING 0x100	/* don't skip any pages */
 #define VACOPT_SKIP_DATABASE_STATS 0x200	/* skip vac_update_datfrozenxid() */
 #define VACOPT_ONLY_DATABASE_STATS 0x400	/* only vac_update_datfrozenxid() */
-
+#define VACOPT_SMART_ANALYZE   0x00010000  /* skip unchanged relations during ANALYZE */
 /*
  * Values used by index_cleanup and truncate params.
  *
diff --git a/src/test/regress/expected/analyze_smart.out b/src/test/regress/expected/analyze_smart.out
new file mode 100644
index 00000000..4ae555e1
--- /dev/null
+++ b/src/test/regress/expected/analyze_smart.out
@@ -0,0 +1,71 @@
+--
+-- SMART ANALYZE should not skip tables that were never analyzed
+--
+CREATE TABLE sa_never (id int);
+-- Run SMART ANALYZE directly (no prior ANALYZE)
+ANALYZE (SMART) sa_never;
+-- Verify SMART ANALYZE ran or not
+SELECT
+    last_analyze IS NOT NULL AS analyzed,
+    n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname = 'sa_never';
+ analyzed | n_mod_since_analyze 
+----------+---------------------
+ t        |                   0
+(1 row)
+
+--
+-- SMART ANALYZE regression test
+--
+CREATE TABLE sa1 (id int);
+CREATE TABLE sa2 (id int);
+-- Initial analyze so stats exist
+ANALYZE;
+-- Modify only sa1
+INSERT INTO sa1 VALUES (1);
+-- Make sure stats snapshot is fresh
+SELECT pg_stat_clear_snapshot();
+ pg_stat_clear_snapshot 
+------------------------
+ 
+(1 row)
+
+-- Check modifications
+SELECT relname, n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+ relname | n_mod_since_analyze 
+---------+---------------------
+ sa1     |                   0
+ sa2     |                   0
+(2 rows)
+
+-- Run SMART ANALYZE on both tables
+ANALYZE (SMART) sa1, sa2;
+-- Refresh stats again
+SELECT pg_stat_clear_snapshot();
+ pg_stat_clear_snapshot 
+------------------------
+ 
+(1 row)
+
+-- Verify post-conditions:
+-- sa1 has n_mod_since_analyze = 0 because it was analyzed
+-- sa2 has n_mod_since_analyze = 0 because it was skipped and unchanged
+SELECT
+    relname,
+    n_mod_since_analyze = 0 AS reset_after_smart_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+ relname | reset_after_smart_analyze 
+---------+---------------------------
+ sa1     | t
+ sa2     | t
+(2 rows)
+
+DROP TABLE sa1;
+DROP TABLE sa2;
+DROP TABLE sa_never;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 021d57f6..f3379fc5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -90,6 +90,7 @@ test: rules psql psql_crosstab psql_pipeline amutils stats_ext collate.linux.utf
 test: select_parallel
 test: write_parallel
 test: vacuum_parallel
+test: analyze_smart
 
 # Run this alone, because concurrent DROP TABLE would make non-superuser
 # "ANALYZE;" fail with "relation with OID $n does not exist".
diff --git a/src/test/regress/sql/analyze_smart.sql b/src/test/regress/sql/analyze_smart.sql
new file mode 100644
index 00000000..f7e6733c
--- /dev/null
+++ b/src/test/regress/sql/analyze_smart.sql
@@ -0,0 +1,58 @@
+--
+-- SMART ANALYZE should not skip tables that were never analyzed
+--
+
+CREATE TABLE sa_never (id int);
+
+-- Run SMART ANALYZE directly (no prior ANALYZE)
+ANALYZE (SMART) sa_never;
+
+-- Verify SMART ANALYZE ran or not
+SELECT
+    last_analyze IS NOT NULL AS analyzed,
+    n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname = 'sa_never';
+
+
+--
+-- SMART ANALYZE regression test
+--
+
+CREATE TABLE sa1 (id int);
+CREATE TABLE sa2 (id int);
+
+-- Initial analyze so stats exist
+ANALYZE;
+
+-- Modify only sa1
+INSERT INTO sa1 VALUES (1);
+
+-- Make sure stats snapshot is fresh
+SELECT pg_stat_clear_snapshot();
+
+-- Check modifications
+SELECT relname, n_mod_since_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+
+-- Run SMART ANALYZE on both tables
+ANALYZE (SMART) sa1, sa2;
+
+-- Refresh stats again
+SELECT pg_stat_clear_snapshot();
+
+-- Verify post-conditions:
+-- sa1 has n_mod_since_analyze = 0 because it was analyzed
+-- sa2 has n_mod_since_analyze = 0 because it was skipped and unchanged
+SELECT
+    relname,
+    n_mod_since_analyze = 0 AS reset_after_smart_analyze
+FROM pg_stat_user_tables
+WHERE relname IN ('sa1', 'sa2')
+ORDER BY relname;
+
+DROP TABLE sa1;
+DROP TABLE sa2;
+DROP TABLE sa_never;
-- 
2.43.0

#17Robert Treat
xzilla@users.sourceforge.net
In reply to: Christoph Berg (#11)
Re: Optional skipping of unchanged relations during ANALYZE?

On Wed, Jan 21, 2026 at 4:44 AM Christoph Berg <myon@debian.org> wrote:

Re: VASUKI M

VACUUM(SMART);

IMHO it was a historical mistake to combine VACUUM and ANALYZE into a
single command. We should not add any more options on that
combination. If people want to pass options to ANALYZE, they should
call ANALYZE and not VACUUM.

I don't know if I go that far, but if you are saying that you dont
think "smart analyze" should be an option for vacuum runs, I can get
onboard with that. We don't really know what any given vacuum run will
do with regards to the table, but if we are shuffling data / storage
around, it probably makes sense to update statistics info along the
way.

Robert Treat
https://xzilla.net

#18Sami Imseih
samimseih@gmail.com
In reply to: VASUKI M (#16)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi,

I would appreciate feedback on the overall approach.

I did not read through the patch in detail but by looking at the commit
message:

"A relation is skipped only if:
- it has been analyzed before (manual or auto-analyze), and
- n_mod_since_analyze == 0

Relations that have never been analyzed are always analyzed normally.
The default ANALYZE behavior is unchanged unless SMART is explicitly
specified.
"

I can't help but think that this SMART option is not as smart as it
should be to actually
be valuable.

I agree that we should never skip a table that has never been
analyzed. My concern
is that n_mod_since_analyze == 0 is not very useful. What if I modify
1 tuple? does
that really justify an ANALYZE to run on the table? Shouldn't the
decision be driven based
on some threshold calculation; similar to how autoanalyze makes the decision?

--
Sami Imseih
Amazon Web Services (AWS)

#19Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Sami Imseih (#18)
Re: Optional skipping of unchanged relations during ANALYZE?

I spent some more time thinking about this new option.

On 22.01.2026 23:18, Sami Imseih wrote:

I can't help but think that this SMART option is not as smart as it
should be to actually
be valuable.

I agree that we should never skip a table that has never been
analyzed. My concern
is that n_mod_since_analyze == 0 is not very useful.

IMO, for the purpose of ensuring that we never skip relations that have
never been analyzed, checking last_analyze / last_autoanalyze being NULL
seems sufficient and reliable.

What if I modify
1 tuple? does
that really justify an ANALYZE to run on the table? Shouldn't the
decision be driven based
on some threshold calculation; similar to how autoanalyze makes the decision?

The primary purpose of ANALYZE is to allow users to explicitly
rebuildstatistics when they believe it is necessary. When a user
specifiesparticular tables or columns (e.g., ANALYZE table; or ANALYZE
table(i, j); ), I would not expect them to use this newoption - in that
case, the intent is usually to force statistics to berecollected.

However, the situation looks different when ANALYZE is run across
theentire database (i.e., plain ANALYZE;). In that context, havingan
option to skip relations that are known not to have changed sincetheir
last analyze seems useful, as it avoids doing work that is
clearlyunnecessary. That said, I think we still need to be precise about
what exactly "relations that have not changed" means in this context, in
order to understand where statistics would and would not be rebuilt. In
particular, relying solely on n_mod_since_analyze == 0 does not seem
sufficient, as we have already discussed several cases where ANALYZE may
still be required even without direct data modifications (e.g.
partitioned tables, inheritance, foreign tables, extended statistics, etc.)

About thresholds: I’m not convinced they make much sense for
manualANALYZE. autovacuum already exists to decide when statistics need
tobe refreshed based on thresholds, and if those conditions are met,
itwill run automatically. I’m not sure there is much value in
duplicatingthat logic for explicit ANALYZE commands.

What do you think?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

#20Sami Imseih
samimseih@gmail.com
In reply to: Ilia Evdokimov (#19)
Re: Optional skipping of unchanged relations during ANALYZE?

I can't help but think that this SMART option is not as smart as it
should be to actually
be valuable.

I agree that we should never skip a table that has never been
analyzed. My concern
is that n_mod_since_analyze == 0 is not very useful.

IMO, for the purpose of ensuring that we never skip relations that have never been analyzed,
checking last_analyze / last_autoanalyze being NULL seems sufficient and reliable.

edba754f052 introduced --missing-stats-only for vacuumdb. Although
this was intended
for pg_upgrade, it does note in the commit message that "it might be
useful in other situations"
Perhaps, this is one of the situations.

So, instead of a smart mode, maybe we should be thinking about an
ANALYZE (missing_stats_only) option that follows what is done in
vacuumdb; and will skip tables that don't need to be analyzed.
Ultimately vacuumdb can just use this option.

The criteria for tables missing stats is more comprehensive than a simple
last_analyze / last_autoanalyze being NULL.

A followup commit 984d7165dd also mentions:

"
For v19, perhaps we could introduce a simple, inexpensive way to
discover which relations are missing statistics, such as a system
function or view with similar privilege requirements to ANALYZE.
Unfortunately, it is far too late for anything like that in v18.
"

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)

#21VASUKI M
vasukianand0119@gmail.com
In reply to: Sami Imseih (#20)
Re: Optional skipping of unchanged relations during ANALYZE?

Hi all,

Thanks a lot for the detailed feedback — this has been very
helpful.Answering to all mails in one.

A few clarifications on intent and scope, and how this relates to the
points raised:

Autovacuum overlap
I agree there is some conceptual overlap with autovacuum’s analyze decision
logic. The intent here is not to replace or duplicate autovacuum
heuristics, but to reduce clearly redundant work during explicit ANALYZE
runs (especially plain ANALYZE; across the whole database). Autovacuum
already handles threshold-based decisions well; this option is meant to be
a lightweight, explicit opt-in for manual ANALYZE usage.

Thresholds vs n_mod_since_analyze
I agree that n_mod_since_analyze == 0 is a very simple condition and not
“smart” in the general sense. That is intentional for now. This option is
not trying to answer when statistics should be refreshed optimally, but
only to skip relations that are known to be unchanged since the last
analyze. If even a single tuple is modified, SMART ANALYZE will still
re-run, preserving conservative behavior.

Tables never analyzed
As Christoph and Ilia pointed out earlier, skipping tables that were never
analyzed would be incorrect. The current logic explicitly avoids that by
requiring last_analyze or last_autoanalyze to be present before skipping.
Tables without prior statistics are always analyzed.

Relation to vacuumdb --missing-stats-only
I agree this is related but slightly different in intent.
--missing-stats-only answers “does this table have any statistics at all?”,
while SMART ANALYZE answers “has this table changed since the last
statistics collection?”. Both seem useful, but they target different use
cases. I see SMART ANALYZE primarily as a performance optimization for
repeated manual ANALYZE runs on mostly-static schemas.

Extended statistics / partitions / inheritance
These are valid concerns. The current patch intentionally does not attempt
to handle extended statistics, partitioned tables, inheritance, foreign
tables, etc. I wanted to start with a minimal, explicit, and conservative
behavior for regular relations only. I agree these areas need careful
consideration before extending the logic further, and I plan to look into
them based on feedback.

VACUUM vs ANALYZE
I also agree with the concern about adding more options to VACUUM. The
current patch focuses on ANALYZE usage; I’m not proposing this as a VACUUM
option.

NAMING
Although as sami said this SMART is not smart enough as it should be , I
will change name accordingly in the further patches based on urs and
others opinion once it is decided.
Based on feedback, I’m happy to revise direction, naming, or scope before
taking this further.

Thanks again for the thoughtful discussion — really appreciate the guidance.

Best regards,
Vasuki M
C-DAC,Chennai.

#22Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: VASUKI M (#21)
Re: Optional skipping of unchanged relations during ANALYZE?

On 23.01.2026 09:33, VASUKI M wrote:

Relation to vacuumdb --missing-stats-only
I agree this is related but slightly different in intent.
--missing-stats-only answers “does this table have any statistics at
all?”, while SMART ANALYZE answers “has this table changed since the
last statistics collection?”. Both seem useful, but they target
different use cases. I see SMART ANALYZE primarily as a performance
optimization for repeated manual ANALYZE runs on mostly-static schemas.

LGTM. Thanks to Sami for pointing this out.

It seems reasonable to start by introducing an option for plain ANALYZE
(without specifying tables or columns) that follows the same idea as
vacuumdb --missing-stats-only. While this flag was originally introduced
primarily to support pg_upgrade workflows, exposing similar
functionality at the ANALYZE level also seems useful on its own. That
would give us a clear and well-defined first step. At the SQL level, a
name such as ANALYZE (MISSING_STATS_ONLY) would be a good fit and remain
consistent with the vacuumdb option.

Thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

#23Sami Imseih
samimseih@gmail.com
In reply to: VASUKI M (#21)
Re: Optional skipping of unchanged relations during ANALYZE?

Thanks for the detailed summary!

It is important to point out that this feature is trying to do 2 distinct
things in 1 command. run analyze under when either one of these conditions
is true:

1/ Table has not been analyzed yet.
2/ Table has been modified.

Thanks a lot for the detailed feedback — this has been very helpful.Answering to all mails in one.

A few clarifications on intent and scope, and how this relates to the points raised:

Autovacuum overlap
I agree there is some conceptual overlap with autovacuum’s analyze decision logic.
The intent here is not to replace or duplicate autovacuum heuristics, but to reduce

Yes, I agree with this.

I agree that n_mod_since_analyze == 0 is a very simple condition
and not “smart” in the general sense. That is intentional for now.
This option is not trying to answer when statistics should be refreshed optimally,
but only to skip relations that are known to be unchanged since the last analyze.
If even a single tuple is modified, SMART ANALYZE will still re-run, preserving
conservative behavior.

Yes, this is my concern. Why would I want to analyze if 1 row or a negligible
amount of rows are modified? I understand that this feature is trying to
keep the decision making very simple, but I think it's too simple to actually
be helpful in addressing the wasted effort of an ANALYZE command.

Tables never analyzed
As Christoph and Ilia pointed out earlier, skipping tables that were never analyzed would be incorrect.
The current logic explicitly avoids that by requiring last_analyze or last_autoanalyze to be present
before skipping. Tables without prior statistics are always analyzed.

I agree with this, but I think it's more than just tables that have
not been analyzed.
What if a new column is added after the last (auto)analyze. Would we not want to
trigger an analyze in that case?

Relation to vacuumdb --missing-stats-only
I agree this is related but slightly different in intent. --missing-stats-only
answers “does this table have any statistics at all?”, while SMART ANALYZE
answers “has this table changed since the last statistics collection?”. Both seem
useful, but they target different use cases. I see SMART ANALYZE primarily
as a performance optimization for repeated manual ANALYZE runs on mostly-static schemas.

SMART ANALYZE is trying to answer 2 questions "which table does not
have any statistics at all"
and "has this table changed since the last statistics collection?”, right?

So, maybe they need to be 2 separate options.

Although as sami said this SMART is not smart enough as it should be ,
I will change name accordingly in the further patches

Yup, I am not too fond of SMART in the name. Also, then name itself
is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
hand tell you exactly what they[re used for.

--
Sami Imseih
Amazon Web Services (AWS)