New criteria for autovacuum
Hi hackers,
Sometime ago I investigated slow query performance case of one customer
and noticed that index-only scan has made a lot of heap fetches.
-> Index Only Scan using ix_client_objects_vendor_object_id on
client_objects client_objects_1 (cost=0.56..2.78 rows=1 width=0) (actual
time=0.006..0.006 rows=1 loops=208081) Index Cond: (vendor_object_id =
vendor_objects.id) Heap Fetches: 208081 Buffers: shared hit=1092452
read=156034
So almost any index entry requires visibility check and index-only scan
is actually normal index-scan.
It certainly have bad impact on performance.
I do not know what happen in this particular case, why pages are not
marked as all-visible and why index-only scan plan was chosen by optimizer.
Butthe problem can be quite easily reproduced. We can just populate
table with some data with some other transaction with assigned XID active.
Then explicitly vacuum this tables or wait until autovacuum does it.
At this moment table has no more dead or inserted tuples so autovacuum
will not be called for it. But heap pages of this table are still not
marked as all-visible.
And will never be marked as all-visible unless table is updated or is
explicitly vacuumed.
This is why I think that it may be useful to add more columns
to|pg_stat_all_tables|and|pg_stat_all_indexes|views providing
information about heap visibility checks performed by index-only scan.
And in addition to number of dead/inserted tuples add number of such
visibility checks as criteria for performing autovacuum for the
particular table.
Proposed patch is attached.
I am not quit happy with the test - it is intended to check if
autovacuum is really triggered by this new criteria. But it depends on
autovacuum activation frequency and may take several seconds.
Will be glad to receive any feedback.
Attachments:
check_autovacuum-v1.patchtext/plain; charset=UTF-8; name=check_autovacuum-v1.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fea683cb49c..f8dc1d04deb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8929,6 +8929,28 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-autovacuum-vacuum-check-threshold" xreflabel="autovacuum_vacuum_check_threshold">
+ <term><varname>autovacuum_vacuum_check_threshold</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>autovacuum_vacuum_check_threshold</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the number of heap tuple visibility checks by index-only scan needed to trigger a
+ <command>VACUUM</command> in any one table.
+ The default is 1000 tuples. If -1 is specified, autovacuum will not
+ trigger a <command>VACUUM</command> operation on any tables based on
+ the number of visibility checks.
+ This parameter can only be set in the <filename>postgresql.conf</filename>
+ file or on the server command line;
+ but the setting can be overridden for individual tables by
+ changing table storage parameters.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-autovacuum-analyze-threshold" xreflabel="autovacuum_analyze_threshold">
<term><varname>autovacuum_analyze_threshold</varname> (<type>integer</type>)
<indexterm>
@@ -8990,6 +9012,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-autovacuum-vacuum-check-scale-factor" xreflabel="autovacuum_vacuum_check_scale_factor">
+ <term><varname>autovacuum_vacuum_check_scale_factor</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_vacuum_check_scale_factor</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies a fraction of the table size to add to
+ <varname>autovacuum_vacuum_check_threshold</varname>
+ when deciding whether to trigger a <command>VACUUM</command>.
+ The default is 0.2 (20% of table size).
+ This parameter can only be set in the <filename>postgresql.conf</filename>
+ file or on the server command line;
+ but the setting can be overridden for individual tables by
+ changing table storage parameters.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-autovacuum-analyze-scale-factor" xreflabel="autovacuum_analyze_scale_factor">
<term><varname>autovacuum_analyze_scale_factor</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 46c1dce222d..f93cf1671ed 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1869,6 +1869,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_max_threshold)},
{"autovacuum_vacuum_insert_threshold", RELOPT_TYPE_INT,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_ins_threshold)},
+ {"autovacuum_vacuum_check_threshold", RELOPT_TYPE_INT,
+ offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_check_threshold)},
{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_threshold)},
{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,
@@ -1895,6 +1897,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_scale_factor)},
{"autovacuum_vacuum_insert_scale_factor", RELOPT_TYPE_REAL,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_ins_scale_factor)},
+ {"autovacuum_vacuum_check_scale_factor", RELOPT_TYPE_REAL,
+ offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_check_scale_factor)},
{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_scale_factor)},
{"user_catalog_table", RELOPT_TYPE_BOOL,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31d269b7ee0..dfaf8e51ac6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -700,7 +700,9 @@ CREATE VIEW pg_stat_all_tables AS
pg_stat_get_total_vacuum_time(C.oid) AS total_vacuum_time,
pg_stat_get_total_autovacuum_time(C.oid) AS total_autovacuum_time,
pg_stat_get_total_analyze_time(C.oid) AS total_analyze_time,
- pg_stat_get_total_autoanalyze_time(C.oid) AS total_autoanalyze_time
+ pg_stat_get_total_autoanalyze_time(C.oid) AS total_autoanalyze_time,
+ pg_stat_get_check_since_vacuum(C.oid) AS n_check_since_vacuum,
+ pg_stat_get_tuples_checked(C.oid) AS n_tup_check
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
@@ -721,7 +723,8 @@ CREATE VIEW pg_stat_xact_all_tables AS
pg_stat_get_xact_tuples_updated(C.oid) AS n_tup_upd,
pg_stat_get_xact_tuples_deleted(C.oid) AS n_tup_del,
pg_stat_get_xact_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
- pg_stat_get_xact_tuples_newpage_updated(C.oid) AS n_tup_newpage_upd
+ pg_stat_get_xact_tuples_newpage_updated(C.oid) AS n_tup_newpage_upd,
+ pg_stat_get_xact_tuples_checked(C.oid) AS n_tup_check
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
@@ -802,7 +805,8 @@ CREATE VIEW pg_stat_all_indexes AS
pg_stat_get_numscans(I.oid) AS idx_scan,
pg_stat_get_lastscan(I.oid) AS last_idx_scan,
pg_stat_get_tuples_returned(I.oid) AS idx_tup_read,
- pg_stat_get_tuples_fetched(I.oid) AS idx_tup_fetch
+ pg_stat_get_tuples_fetched(I.oid) AS idx_tup_fetch,
+ pg_stat_get_tuples_checked(I.oid) AS idx_tup_check
FROM pg_class C JOIN
pg_index X ON C.oid = X.indrelid JOIN
pg_class I ON I.oid = X.indexrelid
@@ -1084,7 +1088,8 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_sessions_killed(D.oid) AS sessions_killed,
pg_stat_get_db_parallel_workers_to_launch(D.oid) as parallel_workers_to_launch,
pg_stat_get_db_parallel_workers_launched(D.oid) as parallel_workers_launched,
- pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
+ pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset,
+ pg_stat_get_db_tuples_checked(D.oid) AS tup_checked
FROM (
SELECT 0 AS oid, NULL::name AS datname
UNION ALL
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f464cca9507..321da2bcda3 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -29,6 +29,7 @@
* ExecIndexOnlyScanInitializeWorker attach to DSM info in parallel worker
*/
#include "postgres.h"
+#include "pgstat.h"
#include "access/genam.h"
#include "access/relscan.h"
@@ -166,6 +167,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
* Rats, we have to visit the heap to check visibility.
*/
InstrCountTuples2(node, 1);
+ pgstat_count_heap_check(scandesc->heapRelation);
+ pgstat_count_heap_check(scandesc->indexRelation);
if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
continue; /* no visible tuple, try next index entry */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2513a8ef8a6..05b92bf330b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -125,6 +125,8 @@ int autovacuum_vac_max_thresh;
double autovacuum_vac_scale;
int autovacuum_vac_ins_thresh;
double autovacuum_vac_ins_scale;
+int autovacuum_vac_check_thresh;
+double autovacuum_vac_check_scale;
int autovacuum_anl_thresh;
double autovacuum_anl_scale;
int autovacuum_freeze_max_age;
@@ -2945,19 +2947,23 @@ relation_needs_vacanalyze(Oid relid,
int vac_base_thresh,
vac_max_thresh,
vac_ins_base_thresh,
+ vac_check_base_thresh,
anl_base_thresh;
float4 vac_scale_factor,
vac_ins_scale_factor,
+ vac_check_scale_factor,
anl_scale_factor;
/* thresholds calculated from above constants */
float4 vacthresh,
vacinsthresh,
+ vaccheckthresh,
anlthresh;
/* number of vacuum (resp. analyze) tuples at this time */
float4 vactuples,
instuples,
+ checktuples,
anltuples;
/* freeze parameters */
@@ -2999,6 +3005,15 @@ relation_needs_vacanalyze(Oid relid,
? relopts->vacuum_ins_threshold
: autovacuum_vac_ins_thresh;
+ vac_check_scale_factor = (relopts && relopts->vacuum_check_scale_factor >= 0)
+ ? relopts->vacuum_check_scale_factor
+ : autovacuum_vac_check_scale;
+
+ /* -1 is used to disable check vacuums */
+ vac_check_base_thresh = (relopts && relopts->vacuum_check_threshold >= -1)
+ ? relopts->vacuum_check_threshold
+ : autovacuum_vac_check_thresh;
+
anl_scale_factor = (relopts && relopts->analyze_scale_factor >= 0)
? relopts->analyze_scale_factor
: autovacuum_anl_scale;
@@ -3032,7 +3047,7 @@ relation_needs_vacanalyze(Oid relid,
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
force_vacuum = MultiXactIdIsValid(relminmxid) &&
- MultiXactIdPrecedes(relminmxid, multiForceLimit);
+ MultiXactIdPrecedes(relminmxid, multiForceLimit);
}
*wraparound = force_vacuum;
@@ -3061,6 +3076,7 @@ relation_needs_vacanalyze(Oid relid,
vactuples = tabentry->dead_tuples;
instuples = tabentry->ins_since_vacuum;
anltuples = tabentry->mod_since_analyze;
+ checktuples = tabentry->check_since_vacuum;
/* If the table hasn't yet been vacuumed, take reltuples as zero */
if (reltuples < 0)
@@ -3089,6 +3105,7 @@ relation_needs_vacanalyze(Oid relid,
vacinsthresh = (float4) vac_ins_base_thresh +
vac_ins_scale_factor * reltuples * pcnt_unfrozen;
+ vaccheckthresh = (float4) vac_check_base_thresh + vac_check_scale_factor * reltuples;
anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
/*
@@ -3096,18 +3113,27 @@ relation_needs_vacanalyze(Oid relid,
* reset, because if that happens, the last vacuum and analyze counts
* will be reset too.
*/
- if (vac_ins_base_thresh >= 0)
- elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
+ if (vac_ins_base_thresh >= 0 && vac_check_base_thresh >= 0)
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: %.0f (threshold %.0f), anl: %.0f (threshold %.0f), check: %.0f (threshold %.0f)",
+ NameStr(classForm->relname),
+ vactuples, vacthresh, instuples, vacinsthresh, anltuples, anlthresh, checktuples, vaccheckthresh);
+ else if (vac_ins_base_thresh >= 0)
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: %.0f (threshold %.0f), anl: %.0f (threshold %.0f), check: (disabled)",
NameStr(classForm->relname),
vactuples, vacthresh, instuples, vacinsthresh, anltuples, anlthresh);
+ else if (vac_check_base_thresh >= 0)
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: disabled), anl: %.0f (threshold %.0f), check: %.0f (threshold %.0f)",
+ NameStr(classForm->relname),
+ vactuples, vacthresh, anltuples, anlthresh, checktuples, vaccheckthresh);
else
- elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: (disabled), anl: %.0f (threshold %.0f)",
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: (disabled), anl: %.0f (threshold %.0f), check: (disabled)",
NameStr(classForm->relname),
vactuples, vacthresh, anltuples, anlthresh);
/* Determine if this table needs vacuum or analyze. */
*dovacuum = force_vacuum || (vactuples > vacthresh) ||
- (vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+ (vac_ins_base_thresh >= 0 && instuples > vacinsthresh) ||
+ (vac_check_base_thresh >= 0 && checktuples > vaccheckthresh);
*doanalyze = (anltuples > anlthresh);
}
else
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index fbaf8364117..9adfbf4f87d 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -449,6 +449,7 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PGSTAT_ACCUM_DBCOUNT(tuples_inserted);
PGSTAT_ACCUM_DBCOUNT(tuples_updated);
PGSTAT_ACCUM_DBCOUNT(tuples_deleted);
+ PGSTAT_ACCUM_DBCOUNT(tuples_checked);
/* last_autovac_time is reported immediately */
Assert(pendingent->last_autovac_time == 0);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index d64595a165c..1cca4e8c4f6 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -246,6 +246,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
* stragglers.
*/
tabentry->ins_since_vacuum = 0;
+ tabentry->check_since_vacuum = 0;
if (AmAutoVacuumWorkerProcess())
{
@@ -852,6 +853,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->tuples_inserted += lstats->counts.tuples_inserted;
tabentry->tuples_updated += lstats->counts.tuples_updated;
tabentry->tuples_deleted += lstats->counts.tuples_deleted;
+ tabentry->tuples_checked += lstats->counts.tuples_checked;
tabentry->tuples_hot_updated += lstats->counts.tuples_hot_updated;
tabentry->tuples_newpage_updated += lstats->counts.tuples_newpage_updated;
@@ -863,12 +865,14 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->live_tuples = 0;
tabentry->dead_tuples = 0;
tabentry->ins_since_vacuum = 0;
+ tabentry->check_since_vacuum = 0;
}
tabentry->live_tuples += lstats->counts.delta_live_tuples;
tabentry->dead_tuples += lstats->counts.delta_dead_tuples;
tabentry->mod_since_analyze += lstats->counts.changed_tuples;
tabentry->ins_since_vacuum += lstats->counts.tuples_inserted;
+ tabentry->check_since_vacuum += lstats->counts.tuples_checked;
tabentry->blocks_fetched += lstats->counts.blocks_fetched;
tabentry->blocks_hit += lstats->counts.blocks_hit;
@@ -886,6 +890,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
dbentry->tuples_inserted += lstats->counts.tuples_inserted;
dbentry->tuples_updated += lstats->counts.tuples_updated;
dbentry->tuples_deleted += lstats->counts.tuples_deleted;
+ dbentry->tuples_checked += lstats->counts.tuples_checked;
dbentry->blocks_fetched += lstats->counts.blocks_fetched;
dbentry->blocks_hit += lstats->counts.blocks_hit;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97af7c6554f..d5ae28ccdb6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -73,6 +73,9 @@ PG_STAT_GET_RELENTRY_INT64(dead_tuples)
/* pg_stat_get_ins_since_vacuum */
PG_STAT_GET_RELENTRY_INT64(ins_since_vacuum)
+/* pg_stat_get_check_since_vacuum */
+PG_STAT_GET_RELENTRY_INT64(check_since_vacuum)
+
/* pg_stat_get_live_tuples */
PG_STAT_GET_RELENTRY_INT64(live_tuples)
@@ -103,6 +106,9 @@ PG_STAT_GET_RELENTRY_INT64(tuples_returned)
/* pg_stat_get_tuples_updated */
PG_STAT_GET_RELENTRY_INT64(tuples_updated)
+/* pg_stat_get_tuples_checked */
+PG_STAT_GET_RELENTRY_INT64(tuples_checked)
+
/* pg_stat_get_vacuum_count */
PG_STAT_GET_RELENTRY_INT64(vacuum_count)
@@ -1091,6 +1097,9 @@ PG_STAT_GET_DBENTRY_INT64(tuples_returned)
/* pg_stat_get_db_tuples_updated */
PG_STAT_GET_DBENTRY_INT64(tuples_updated)
+/* pg_stat_get_db_tuples_checked */
+PG_STAT_GET_DBENTRY_INT64(tuples_checked)
+
/* pg_stat_get_db_xact_commit */
PG_STAT_GET_DBENTRY_INT64(xact_commit)
@@ -1774,6 +1783,9 @@ PG_STAT_GET_XACT_RELENTRY_INT64(tuples_hot_updated)
/* pg_stat_get_xact_tuples_newpage_updated */
PG_STAT_GET_XACT_RELENTRY_INT64(tuples_newpage_updated)
+/* pg_stat_get_xact_tuples_checked */
+PG_STAT_GET_XACT_RELENTRY_INT64(tuples_checked)
+
/* pg_stat_get_xact_blocks_fetched */
PG_STAT_GET_XACT_RELENTRY_INT64(blocks_fetched)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4eaeca89f2c..567dc2c6fa9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3514,6 +3514,15 @@ struct config_int ConfigureNamesInt[] =
60, 1, INT_MAX / 1000,
NULL, NULL, NULL
},
+ {
+ {"autovacuum_vacuum_check_threshold", PGC_SIGHUP, VACUUM_AUTOVACUUM,
+ gettext_noop("Minimum number of heap tuple checks by index-only scan prior to vacuum, or -1 to disable check vacuums."),
+ NULL
+ },
+ &autovacuum_vac_check_thresh,
+ 1000, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"autovacuum_vacuum_threshold", PGC_SIGHUP, VACUUM_AUTOVACUUM,
gettext_noop("Minimum number of tuple updates or deletes prior to vacuum."),
@@ -4057,6 +4066,16 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"autovacuum_vacuum_check_scale_factor", PGC_SIGHUP, VACUUM_AUTOVACUUM,
+ gettext_noop("Number of heap tuple checks by index-only scan prior to vacuum as a fraction of reltuples."),
+ NULL
+ },
+ &autovacuum_vac_check_scale,
+ 0.2, 0.0, 100.0,
+ NULL, NULL, NULL
+ },
+
{
{"autovacuum_vacuum_cost_delay", PGC_SIGHUP, VACUUM_AUTOVACUUM,
gettext_noop("Vacuum cost delay in milliseconds, for autovacuum."),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b68b16d79d..6123321b6ee 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5511,6 +5511,10 @@
proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_tuples_deleted' },
+{ oid => '4551', descr => 'statistics: number of heap tuples checked by index-only scan',
+ proname => 'pg_stat_get_tuples_checked', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_tuples_checked' },
{ oid => '1972', descr => 'statistics: number of tuples hot updated',
proname => 'pg_stat_get_tuples_hot_updated', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5538,6 +5542,11 @@
proname => 'pg_stat_get_ins_since_vacuum', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_ins_since_vacuum' },
+{ oid => '4554',
+ descr => 'statistics: number of heap tuples checked by index-only scan since last vacuum',
+ proname => 'pg_stat_get_check_since_vacuum', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_check_since_vacuum' },
{ oid => '1934', descr => 'statistics: number of blocks fetched',
proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5766,6 +5775,10 @@
proname => 'pg_stat_get_db_tuples_deleted', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_tuples_deleted' },
+{ oid => '4552', descr => 'statistics: heap tuples fetched by index-only scan for database',
+ proname => 'pg_stat_get_db_tuples_checked', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_tuples_checked' },
{ oid => '3065',
descr => 'statistics: recovery conflicts in database caused by drop tablespace',
proname => 'pg_stat_get_db_conflict_tablespace', provolatile => 's',
@@ -6062,6 +6075,11 @@
proname => 'pg_stat_get_xact_tuples_newpage_updated', provolatile => 'v',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_xact_tuples_newpage_updated' },
+{ oid => '4553',
+ descr => 'statistics: number of tuples checked by index-only scan in current transaction',
+ proname => 'pg_stat_get_xact_tuples_checked', provolatile => 'v',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_xact_tuples_checked' },
{ oid => '3044',
descr => 'statistics: number of blocks fetched in current transaction',
proname => 'pg_stat_get_xact_blocks_fetched', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 378f2f2c2ba..7c11360bb6e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -139,6 +139,7 @@ typedef struct PgStat_TableCounts
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
+ PgStat_Counter tuples_checked;
PgStat_Counter tuples_inserted;
PgStat_Counter tuples_updated;
@@ -350,6 +351,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter tuples_inserted;
PgStat_Counter tuples_updated;
PgStat_Counter tuples_deleted;
+ PgStat_Counter tuples_checked;
TimestampTz last_autovac_time;
PgStat_Counter conflict_tablespace;
PgStat_Counter conflict_lock;
@@ -425,6 +427,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
+ PgStat_Counter tuples_checked;
PgStat_Counter tuples_inserted;
PgStat_Counter tuples_updated;
@@ -436,6 +439,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter dead_tuples;
PgStat_Counter mod_since_analyze;
PgStat_Counter ins_since_vacuum;
+ PgStat_Counter check_since_vacuum;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
@@ -691,6 +695,11 @@ extern void pgstat_report_analyze(Relation rel,
if (pgstat_should_count_relation(rel)) \
(rel)->pgstat_info->counts.tuples_fetched++; \
} while (0)
+#define pgstat_count_heap_check(rel) \
+ do { \
+ if (pgstat_should_count_relation(rel)) \
+ (rel)->pgstat_info->counts.tuples_checked++; \
+ } while (0)
#define pgstat_count_index_scan(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index e8135f41a1c..76af3b5bce7 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -37,6 +37,8 @@ extern PGDLLIMPORT int autovacuum_vac_max_thresh;
extern PGDLLIMPORT double autovacuum_vac_scale;
extern PGDLLIMPORT int autovacuum_vac_ins_thresh;
extern PGDLLIMPORT double autovacuum_vac_ins_scale;
+extern PGDLLIMPORT int autovacuum_vac_check_thresh;
+extern PGDLLIMPORT double autovacuum_vac_check_scale;
extern PGDLLIMPORT int autovacuum_anl_thresh;
extern PGDLLIMPORT double autovacuum_anl_scale;
extern PGDLLIMPORT int autovacuum_freeze_max_age;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d94fddd7cef..d750aa0fb91 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -311,6 +311,7 @@ typedef struct AutoVacOpts
int vacuum_threshold;
int vacuum_max_threshold;
int vacuum_ins_threshold;
+ int vacuum_check_threshold;
int analyze_threshold;
int vacuum_cost_limit;
int freeze_min_age;
@@ -323,6 +324,7 @@ typedef struct AutoVacOpts
float8 vacuum_cost_delay;
float8 vacuum_scale_factor;
float8 vacuum_ins_scale_factor;
+ float8 vacuum_check_scale_factor;
float8 analyze_scale_factor;
} AutoVacOpts;
diff --git a/src/test/regress/expected/check_autovacuum.out b/src/test/regress/expected/check_autovacuum.out
index 429f80a9a08..aa0ba5b355d 100644
--- a/src/test/regress/expected/check_autovacuum.out
+++ b/src/test/regress/expected/check_autovacuum.out
@@ -26,6 +26,14 @@ begin
end loop;
end;
$$ language plpgsql;
+-- Make autovacuum performed more frequently
+alter system set autovacuum_naptime=2;
+select pg_reload_conf();
+ pg_reload_conf
+----------------
+ t
+(1 row)
+
-- Force index-only scan
set enable_bitmapscan=off;
set enable_seqscan=off;
@@ -35,15 +43,15 @@ begin;
-- Just force assignment XID to transaction
create table t2(x integer);
prepare transaction 'pt1';
-insert into t values (generate_series(1,1000000), 0);
+insert into t values (generate_series(1,100000), 0);
-- vacuum should't mark pages as all visible because of prepared xact
vacuum t;
commit prepared 'pt1';
--- With default autovacuum tuning we need to check more than 2k of tuples to trigger autovacuum
-select explain_to_json('select sum(pk) from t where pk between 100000 and 400000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
+-- With default autovacuum tuning we need to check more than 20k of tuples to trigger autovacuum
+select explain_to_json('select sum(pk) from t where pk between 10000 and 40000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
heap_fetches
--------------
- 300001
+ 30001
(1 row)
select pg_sleep(1); -- let statistic be updated
@@ -53,13 +61,7 @@ select pg_sleep(1); -- let statistic be updated
(1 row)
call wait_autovacuum_completion('t');
-select autovacuum_count,last_autovacuum,n_check_since_vacuum,n_tup_check from pg_stat_all_tables where relname='t';
- autovacuum_count | last_autovacuum | n_check_since_vacuum | n_tup_check
-------------------+-------------------------------------+----------------------+-------------
- 1 | Tue Apr 01 08:47:51.957169 2025 PDT | 0 | 300001
-(1 row)
-
-select explain_to_json('select sum(pk) from t where pk between 100000 and 400000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
+select explain_to_json('select sum(pk) from t where pk between 10000 and 40000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
heap_fetches
--------------
0
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 47478969135..88910eb724e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1773,7 +1773,8 @@ pg_stat_all_indexes| SELECT c.oid AS relid,
pg_stat_get_numscans(i.oid) AS idx_scan,
pg_stat_get_lastscan(i.oid) AS last_idx_scan,
pg_stat_get_tuples_returned(i.oid) AS idx_tup_read,
- pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch
+ pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch,
+ pg_stat_get_tuples_checked(i.oid) AS idx_tup_check
FROM (((pg_class c
JOIN pg_index x ON ((c.oid = x.indrelid)))
JOIN pg_class i ON ((i.oid = x.indexrelid)))
@@ -1808,7 +1809,9 @@ pg_stat_all_tables| SELECT c.oid AS relid,
pg_stat_get_total_vacuum_time(c.oid) AS total_vacuum_time,
pg_stat_get_total_autovacuum_time(c.oid) AS total_autovacuum_time,
pg_stat_get_total_analyze_time(c.oid) AS total_analyze_time,
- pg_stat_get_total_autoanalyze_time(c.oid) AS total_autoanalyze_time
+ pg_stat_get_total_autoanalyze_time(c.oid) AS total_autoanalyze_time,
+ pg_stat_get_check_since_vacuum(c.oid) AS n_check_since_vacuum,
+ pg_stat_get_tuples_checked(c.oid) AS n_tup_check
FROM ((pg_class c
LEFT JOIN pg_index i ON ((c.oid = i.indrelid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
@@ -1869,7 +1872,8 @@ pg_stat_database| SELECT oid AS datid,
pg_stat_get_db_sessions_killed(oid) AS sessions_killed,
pg_stat_get_db_parallel_workers_to_launch(oid) AS parallel_workers_to_launch,
pg_stat_get_db_parallel_workers_launched(oid) AS parallel_workers_launched,
- pg_stat_get_db_stat_reset_time(oid) AS stats_reset
+ pg_stat_get_db_stat_reset_time(oid) AS stats_reset,
+ pg_stat_get_db_tuples_checked(oid) AS tup_checked
FROM ( SELECT 0 AS oid,
NULL::name AS datname
UNION ALL
@@ -2169,7 +2173,8 @@ pg_stat_sys_indexes| SELECT relid,
idx_scan,
last_idx_scan,
idx_tup_read,
- idx_tup_fetch
+ idx_tup_fetch,
+ idx_tup_check
FROM pg_stat_all_indexes
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_stat_sys_tables| SELECT relid,
@@ -2201,7 +2206,9 @@ pg_stat_sys_tables| SELECT relid,
total_vacuum_time,
total_autovacuum_time,
total_analyze_time,
- total_autoanalyze_time
+ total_autoanalyze_time,
+ n_check_since_vacuum,
+ n_tup_check
FROM pg_stat_all_tables
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_stat_user_functions| SELECT p.oid AS funcid,
@@ -2221,7 +2228,8 @@ pg_stat_user_indexes| SELECT relid,
idx_scan,
last_idx_scan,
idx_tup_read,
- idx_tup_fetch
+ idx_tup_fetch,
+ idx_tup_check
FROM pg_stat_all_indexes
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_stat_user_tables| SELECT relid,
@@ -2253,7 +2261,9 @@ pg_stat_user_tables| SELECT relid,
total_vacuum_time,
total_autovacuum_time,
total_analyze_time,
- total_autoanalyze_time
+ total_autoanalyze_time,
+ n_check_since_vacuum,
+ n_tup_check
FROM pg_stat_all_tables
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_stat_wal| SELECT wal_records,
@@ -2290,7 +2300,8 @@ pg_stat_xact_all_tables| SELECT c.oid AS relid,
pg_stat_get_xact_tuples_updated(c.oid) AS n_tup_upd,
pg_stat_get_xact_tuples_deleted(c.oid) AS n_tup_del,
pg_stat_get_xact_tuples_hot_updated(c.oid) AS n_tup_hot_upd,
- pg_stat_get_xact_tuples_newpage_updated(c.oid) AS n_tup_newpage_upd
+ pg_stat_get_xact_tuples_newpage_updated(c.oid) AS n_tup_newpage_upd,
+ pg_stat_get_xact_tuples_checked(c.oid) AS n_tup_check
FROM ((pg_class c
LEFT JOIN pg_index i ON ((c.oid = i.indrelid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
@@ -2307,7 +2318,8 @@ pg_stat_xact_sys_tables| SELECT relid,
n_tup_upd,
n_tup_del,
n_tup_hot_upd,
- n_tup_newpage_upd
+ n_tup_newpage_upd,
+ n_tup_check
FROM pg_stat_xact_all_tables
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_stat_xact_user_functions| SELECT p.oid AS funcid,
@@ -2330,7 +2342,8 @@ pg_stat_xact_user_tables| SELECT relid,
n_tup_upd,
n_tup_del,
n_tup_hot_upd,
- n_tup_newpage_upd
+ n_tup_newpage_upd,
+ n_tup_check
FROM pg_stat_xact_all_tables
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_statio_all_indexes| SELECT c.oid AS relid,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0a35f2f8f6a..b9f61655d66 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -133,6 +133,9 @@ test: event_trigger_login
# this test also uses event triggers, so likewise run it by itself
test: fast_default
+# this test checks behaviour of autovacuum so run it standalone
+test: check_autovacuum
+
# run tablespace test at the end because it drops the tablespace created during
# setup that other tests may use.
test: tablespace
diff --git a/src/test/regress/sql/check_autovacuum.sql b/src/test/regress/sql/check_autovacuum.sql
index 37dc74208db..8c83081ea89 100644
--- a/src/test/regress/sql/check_autovacuum.sql
+++ b/src/test/regress/sql/check_autovacuum.sql
@@ -1,21 +1,37 @@
-create function wait_autovacuum_completion(table_name text) returns void AS $$
+create function explain_to_json(text) returns jsonb language plpgsql as
+$$
declare
- autovacuum_counter_before integer;
- autovacuum_counter_after integer;
- n_heap_checkes_since_last_autovacuum integer;
+ js text;
+ heap_fetches integer;
+begin
+ for js in execute 'EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF, SUMMARY OFF, BUFFERS ON, FORMAT JSON)' || $1 loop
+ return js::jsonb;
+ end loop;
+end;
+$$;
+
+create or replace procedure wait_autovacuum_completion(table_name text) AS $$
+declare
+ checks_since_last_autovacuum integer;
+ total_checks integer;
begin
- select autovacuum_count from pg_stat_all_tables where relname=table_name into autovacuum_counter_before;
-- default autovacuum naptime is 1 minute
- for i in 1..60 loop
- select autovacuum_count,n_check_since_vacuum from pg_stat_all_tables where relname=table_name into autovacuum_counter_after,n_heap_checkes_since_last_autovacuum;
- if autovacuum_counter_before != autovacuum_counter_after OR n_heap_checkes_since_last_autovacuum = 0 then
+ loop
+ select n_check_since_vacuum,n_tup_check from pg_stat_all_tables where relname=table_name
+ into checks_since_last_autovacuum,total_checks;
+ if checks_since_last_autovacuum = 0 and total_checks != 0 then
exit;
end if;
- perform pg_sleep_for('1 seconds');
+ perform pg_sleep_for('1 second');
+ rollback;
end loop;
end;
$$ language plpgsql;
+-- Make autovacuum performed more frequently
+alter system set autovacuum_naptime=2;
+select pg_reload_conf();
+
-- Force index-only scan
set enable_bitmapscan=off;
set enable_seqscan=off;
@@ -28,17 +44,16 @@ begin;
create table t2(x integer);
prepare transaction 'pt1';
-insert into t values (generate_series(1,1000000), 0);
+insert into t values (generate_series(1,100000), 0);
-- vacuum should't mark pages as all visible because of prepared xact
vacuum t;
commit prepared 'pt1';
--- With default autovacuum tuning we need to check more than 2k of tuples to trigger autovacuum
-select sum(pk) from t where pk between 100000 and 400000;
-select n_tup_check from pg_stat_all_tables where relname='t';
+-- With default autovacuum tuning we need to check more than 20k of tuples to trigger autovacuum
+select explain_to_json('select sum(pk) from t where pk between 10000 and 40000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
-select wait_autovacuum_completion('t');
+select pg_sleep(1); -- let statistic be updated
+call wait_autovacuum_completion('t');
-select sum(pk) from t where pk between 100000 and 400000;
-select n_tup_check from pg_stat_all_tables where relname='t';
+select explain_to_json('select sum(pk) from t where pk between 10000 and 40000') #>> '{0,Plan,Plans,0,"Heap Fetches"}' as heap_fetches;
Hi Konstantin,
But the problem can be quite easily reproduced. We can just populate table with some data with some other transaction with assigned XID active.
Then explicitly vacuum this tables or wait until autovacuum does it.
At this moment table has no more dead or inserted tuples so autovacuum will not be called for it. But heap pages of this table are still not marked as all-visible.
And will never be marked as all-visible unless table is updated or is explicitly vacuumed.
I decided to experiment with the scenario you are describing. For
those who likes to have exact steps to reproduce the issue, as I do,
here they are:
Session 1:
```
CREATE TABLE humidity(
ts TIMESTAMP NOT NULL,
city TEXT NOT NULL,
humidity INT NOT NULL);
CREATE INDEX humidity_idx ON humidity (ts, city) INCLUDE (humidity);
INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2010-01-01' :: TIMESTAMP,
'2020-12-31', '1 hour') AS ts,
unnest(array['Moscow', 'Berlin']) AS city;
```
Session 2:
```
BEGIN;
INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2022-01-01' :: TIMESTAMP,
'2025-12-31', '1 hour') AS ts,
unnest(array['Moscow', 'Berlin']) AS city;
-- no COMMIT
```
Session 1:
```
VACUUM humidity;
```
Session 2:
```
COMMIT;
```
Session 1:
```
EXPLAIN (ANALYZE, BUFFERS ON) SELECT humidity FROM humidity WHERE ts
= '2022-01-01' AND ts < '2022-05-02' AND city = 'Moscow';
```
The result is:
```
Index Only Scan using humidity_idx on humidity (cost=0.42..58.75
rows=67 width=4) (actual time=0.060..7.490 rows=2904.00 loops=1)
Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
Heap Fetches: 2904
Index Searches: 1
Buffers: shared hit=123
Planning:
Buffers: shared hit=10
Planning Time: 0.840 ms
Execution Time: 7.964 ms
```
... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.
I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.
Thoughts?
--
Best regards,
Aleksander Alekseev
Hi,
... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.
Actually I waited a bit and got a better EXPLAIN:
```
Index Only Scan using humidity_idx on humidity (cost=0.42..181.36
rows=1970 width=4) (actual time=0.372..16.869 rows=2904.00 loops=1)
Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=44
Planning Time: 0.520 ms
Execution Time: 17.980 ms
(7 rows)
```
This happens when CHECKPOINT starts:
```
2025-04-03 18:36:23.435 MSK [209952] LOG: checkpoint starting: time
```
Interestingly it takes unusually long for my toy database:
```
2025-04-03 18:40:53.082 MSK [209952] LOG: checkpoint complete: wrote
3522 buffers (5.4%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
removed, 5 recycled; write=269.463 s, sync=0.029 s, total=269.647 s;
sync files=32, longest=0.004 s, average=0.001 s; distance=68489 kB,
estimate=68489 kB; lsn=0/F4F3870, redo lsn=0/F167DD0
```
There is nothing in between these two lines.
To my humble knowledge, CHECKOINT shouldn't set hint bits and should
take that long. At this point I don't know what's going on.
This is `master` branch, b82e7eddb023.
--
Best regards,
Aleksander Alekseev
On 03/04/2025 6:29 pm, Aleksander Alekseev wrote:
I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.
Overhead seems to be minimal (update of one statistic field in case of
heap fetch in index-only scan) and information about visibility checks
performed by IOS seems to be useful in any case, even if it is not used
by autovacuum.
So I am not sure how frequent this scenario is (when index-only scan has
to perform extra heap checks or is just not used because of VM
examination), but if it can be prevented without much efforts or
overhead, then why not to implement it?
Interestingly it takes unusually long for my toy database:
There is nothing in between these two lines.
To my humble knowledge, CHECKOINT shouldn't set hint bits and should
take that long. At this point I don't know what's going on.
From what I can tell in your example, you ran the manual vacuum ( session 1)
while you had an open transaction (session 2), so vacuum could not remove
the dead tuples or update the visibility map. Once you committed session 2,
autovacuum came in and did its job after the autovacuum_naptime passed
(default 1 minute). Checkpoint does not update the visibility map, only
vacuum does.
IMO, I don't think we need this patch for vacuum, as simply making sure
autovacuum runs more frequently on the table that is accessed via
index-only scans often is a way to deal with this already, i.e
lowering autovacuum_vacuum_scale_factor. Maybe others have
a different opinion?
13 also introduced autovacuum_vacuum_scale_factor to deal with append
only tables that only saw their first vacuum for wraparound
prevention ( 200 million transactions by default ) and that made
index-only scans slow
because of an outdated visibility map as well as the wraparound vacuum being
more disruptive.
As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.
--
Sami Imseih
Amazon Web Services (AWS)
On Thu, Apr 3, 2025 at 4:37 PM Sami Imseih <samimseih@gmail.com> wrote:
From what I can tell in your example, you ran the manual vacuum ( session 1)
while you had an open transaction (session 2), so vacuum could not remove
the dead tuples or update the visibility map. Once you committed session 2,
autovacuum came in and did its job after the autovacuum_naptime passed
(default 1 minute). Checkpoint does not update the visibility map, only
vacuum does.IMO, I don't think we need this patch for vacuum, as simply making sure
autovacuum runs more frequently on the table that is accessed via
index-only scans often is a way to deal with this already, i.e
lowering autovacuum_vacuum_scale_factor. Maybe others have
a different opinion?
Yea, I mean if you have a table that isn't meeting the threshold for
being vacuumed by autovacuum based on inserted/modified tuples that
you are querying a lot, then you should probably change the autovacuum
table storage options for that table.
I don't fully understand if the example provided here is in that
situation (i.e. autovac not being triggered because they were below
the thresholds).
Historically, we haven't updated the VM when doing on-access pruning
because we wanted vacuum to have to vacuum those pages and freeze them
before an aggressive vacuum was required. This release, in
052026c9b903, I added functionality to vacuum to scan more all-visible
pages during regular vacuums. I think this enables us to update the VM
during on-access pruning. This is something I plan to work on in 19.
It seems like it would alleviate situations like this.
As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.
What do you imagine using the heap fetches during index-only scans counter for?
- Melanie
I think this enables us to update the VM
during on-access pruning. This is something I plan to work on in 19.
It seems like it would alleviate situations like this.
IMO, index-only scans hitting the heap have always caught users off guard,
especially because scan performance fluctuates between good and bad.
Reducing this performance volatility is a great improvement.
As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.What do you imagine using the heap fetches during index-only scans counter for?
A user can only determine that they are using index-only scans through EXPLAIN,
which limits visibility. It would be beneficial to provide additional
metrics to
support autovacuum/vacuum tuning for workloads that rely on index-only scans
and require them to be as efficient as possible.
Currently, pg_stat_all_tables/indexes displays the number of index scans,
but it would also be helpful to show the number of index-only scans
specifically,
along with "heap fetches". This would help in identifying the "heap
fetches" issue
and confirming whether vacuum tuning had any impact to reduce "heap fetches".
The existing indx_tup_fetch combines heap fetches from normal and
index-only scans,
so it could be useful but not perfect for this task.
--
Sami Imseih
Amazon Web Services (AWS)
On 03/04/2025 6:50 pm, Aleksander Alekseev wrote:
Hi,
... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.Actually I waited a bit and got a better EXPLAIN:
```
Index Only Scan using humidity_idx on humidity (cost=0.42..181.36
rows=1970 width=4) (actual time=0.372..16.869 rows=2904.00 loops=1)
Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=44
Planning Time: 0.520 ms
Execution Time: 17.980 ms
(7 rows)
```This happens when CHECKPOINT starts:
```
2025-04-03 18:36:23.435 MSK [209952] LOG: checkpoint starting: time
```Interestingly it takes unusually long for my toy database:
```
2025-04-03 18:40:53.082 MSK [209952] LOG: checkpoint complete: wrote
3522 buffers (5.4%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
removed, 5 recycled; write=269.463 s, sync=0.029 s, total=269.647 s;
sync files=32, longest=0.004 s, average=0.001 s; distance=68489 kB,
estimate=68489 kB; lsn=0/F4F3870, redo lsn=0/F167DD0
```There is nothing in between these two lines.
To my humble knowledge, CHECKOINT shouldn't set hint bits and should
take that long. At this point I don't know what's going on.This is `master` branch, b82e7eddb023.
Checkpoint is not setting hint bits and not updating VM.
it just writes dirty pages to disk.
My patch includes simple test reproducing the problem. You can check
that VM is never updated in this case and explicit checkpoint doesn't
solve the issue.
Certainly manual vacuum will help. But user should somehow managed to
notice that index-only scan is not used or used and perform larger
number of heap fetches and understand that problem is with not updated
VM and vacuum is needed to fix it.
In your example both sessions are inserting data into the table. Vacuum
performed in one session doesn't take in account records created by
uncommitted transaction in another session.
So I do not think that plan in your case is improved because of
checkpoint. Most likely autovacuum was just triggered for this table and
updates VM.
What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).
After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.
On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.
ISTM, this is more an issue of ins_since_vacuum being reset to 0 in
pstat_report_vacuum() even though those inserted tuples weren't
necessarily frozen and their pages not set all-visible. I don't know
exactly how we could modify that logic, but insert-triggered vacuums
are meant to set pages all-visible and freeze tuples, and if they
don't do that, it doesn't quite make sense to zero out the counter
that could trigger another one.
That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.
- Melanie
On Fri, Apr 4, 2025 at 12:11 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.
This. It would be really useful to have some kind of a system for
figuring out when -- in terms of XIDs -- we ought to vacuum which
table. I think that's a hard problem, but it would help a lot of
people.
I do not think the approach in the proposed patch is correct at all.
The proposed new check would have exactly the same problem as the
existing one -- this could easily trigger vacuuming at a time when the
relevant tuples can't yet be made all-visible, in which case we'd just
do a lot of VACUUM work for nothing. That's already a problem with
autovacuum in some scenarios, and I bet this would be way worse.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 04/04/2025 7:10 pm, Melanie Plageman wrote:
On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik<knizhnik@garret.ru> wrote:
What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.ISTM, this is more an issue of ins_since_vacuum being reset to 0 in
pstat_report_vacuum() even though those inserted tuples weren't
necessarily frozen and their pages not set all-visible. I don't know
exactly how we could modify that logic, but insert-triggered vacuums
are meant to set pages all-visible and freeze tuples, and if they
don't do that, it doesn't quite make sense to zero out the counter
that could trigger another one.That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.- Melanie
From logical point of view I agree with you: taken in account number of
inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to
`ins_all_visible_since_vacuum` and count only all-visible tuples. If
newly inserted tuple is not visible to all, then it should not be
accounted in statistic and trigger autovacuum. But I have completely no
idea of how to efficiently maintain such counter: we should keep track
of xids of all recently inserted tuples and on each transaction commit
determine which one of them become all-visible.
And your suggestion just to not reset `ins_since_vacuum` until all of
them becomes all-visible may be easier to implement, but under permanent
workload it can lead to situation when `ins_since_vacuum` is never reset
and at each vacuum iteration cause vacuuming of the table. Which may
cause significant degrade of performance. Even without long-living
transactions.
So I still think that maintaining count of heap visibility check is the
best alternative. It quite easy to implement, adds almost no overhead
and this information indicates efficiency of index-only scan. So it
seems to be useful even if not used by autovacuum.
Yes, long-living transactions and vacuum are "antagonists". If there is
long-living transaction, then forcing autovacuum because of number of
visibility checks criteria can also (as in case of not reseting
`ins_since_vacuum` counter) cause degrade of performance because of too
frequent and useless autovacuum runs for the table. But there is big
difference: using `checks_since_vacuum` criteria we trigger autovacuum
next time only when this counter exceeds threshold. Which should not
happen fast because this counter is reset after each vacuum. Unlike
`ins_since_vacuum` counter which you suggested not to reset until pages
are marked as all-visible by vacuum. In the last case autovacuum will be
invoked for the table each `autovacum_naptime` seconds.
On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.
Right, you definitely can't just never reset it to 0. As it is
currently defined, the behavior is correct, that is how many tuples
were inserted since the last vacuum.
As for your proposed patch, I agree with Robert that triggering
vacuums using the stat you proposed just adds to the problem we know
we already have with wasted vacuuming work due to long-running
transactions.
A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.
- Melanie
From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples.
If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum.
cumulative stats are not updated for an in-flight transaction, so
effectively, these rows
are not accounted for by vacuum.
I did just realize however that it is strange that a rolledback
session will accumulate n_tup_ins and n_ins_since_vacuum
# session 1
test=# create table t (id int);
CREATE TABLE
test=# begin;
BEGIN
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1
# session 2
test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
0 | 0 | 0 | 0 | 0 |
0
(1 row)
# session 1
ROLLBACK
# session 2
test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
3 | 0 | 0 | 0 | 3 |
3
(1 row)
ROLLBACKs should not be the norm, but this looks like a bug to me
as it may trigger vacuum based on insert threshold more often.
Thoughts?
So I still think that maintaining count of heap visibility check is the best alternative.
It quite easy to implement, adds almost no overhead and this information indicates efficiency
of index-only scan. So it seems to be useful even if not used by autovacuum.
I agree. It could aid in a user that wants to schedule some manual
vacuum for tables
that need very optimal index-only scans, or to per-table tune autovac
for those types of
tables. I actually think we should add a few columns as mentioned here [0]/messages/by-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA@mail.gmail.com
That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.
This. It would be really useful to have some kind of a system for
figuring out when -- in terms of XIDs -- we ought to vacuum which
table. I think that's a hard problem, but it would help a lot of
people.
hmm, isn't that what anti-wraparound autovac does? or, I may
have missed the point here completely
[0]: /messages/by-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA@mail.gmail.com
--
Sami Imseih
Amazon Web Services (AWS)
On 04/04/2025 10:41 pm, Melanie Plageman wrote:
On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik<knizhnik@garret.ru> wrote:
From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.
Right, you definitely can't just never reset it to 0. As it is
currently defined, the behavior is correct, that is how many tuples
were inserted since the last vacuum.As for your proposed patch, I agree with Robert that triggering
vacuums using the stat you proposed just adds to the problem we know
we already have with wasted vacuuming work due to long-running
transactions.
If we already have a problem with wasted vacuuming then IMHO it should
be addressed (for example by keeping last_autovacuum_xid_horizon). I do
no think that the proposed patch adds something new to the problem. Yes
it will trigger autovacuum after each N visibility checks. And if there
is still some long running transaction, then this vacuum is useless. But
the same is true for criteria based on number of dead tuples.
And from my point of view additional autovacuum runs are less critical
than missed autovacuum runs (when table contains garbage or not updarted
VM but VM is not launched).
A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.
I really considered this alternative when thinking about the solution of
the problem. It is more consistent with hint bit approach.
I declined it in favor of this solution because of the following reasons:
1. Index-only scan holds read-only lock on heap page. In order to update
it, we need to upgrade this lock to exclusive.
2. We need to check visibility for all elements on the page (actually do
something like `heap_page_is_all_visible`) but if there is large number
elements at the page it can be quite expensive. And I afraid that it can
slowdown speed of index-only scan. Yes, only in "slow case" - when it
has to access heap to perform visibility check. But still it may be not
acceptable. Also it is not clear how to mark page as already checked.
Otherwise we will have to repeat this check for all tids referring this
page.
3. `heap_page_is_all_visible` is local to lazyvaccum.c. So to use it in
index-only scan we either have to make it global, either cut&paste it's
code. Just removing "static" is not possible, because it is using local
`LVRelState`, so some refactoring is needed in any case.
4. We need to wal-log VM page and heap pages in case of setting
all-visible bit. It is quite expensive operation. Doing it inside
index-only scan can significantly increase time of select. Certainly
Postgres is not a real-time DBMS. But still it is better to provide some
predictable query execution time. This is why I think that it is better
to do such workt in background (in vaccum).
On Sat, Apr 5, 2025 at 2:02 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.I really considered this alternative when thinking about the solution of the problem. It is more consistent with hint bit approach.
I declined it in favor of this solution because of the following reasons:1. Index-only scan holds read-only lock on heap page. In order to update it, we need to upgrade this lock to exclusive.
2. We need to check visibility for all elements on the page (actually do something like `heap_page_is_all_visible`) but if there is large number elements at the page it can be quite expensive. And I afraid that it can slowdown speed of index-only scan. Yes, only in "slow case" - when it has to access heap to perform visibility check. But still it may be not acceptable. Also it is not clear how to mark page as already checked. Otherwise we will have to repeat this check for all tids referring this page.
3. `heap_page_is_all_visible` is local to lazyvaccum.c. So to use it in index-only scan we either have to make it global, either cut&paste it's code. Just removing "static" is not possible, because it is using local `LVRelState`, so some refactoring is needed in any case.
4. We need to wal-log VM page and heap pages in case of setting all-visible bit. It is quite expensive operation. Doing it inside index-only scan can significantly increase time of select. Certainly Postgres is not a real-time DBMS. But still it is better to provide some predictable query execution time. This is why I think that it is better to do such workt in background (in vaccum).
I wasn't thinking about adding a new VM setting functionality to index
only scan in particular. heapam_index_fetch_tuple() already calls
heap_page_prune_opt() which will do pruning under certain conditions.
I was thinking that we start updating the VM after pruning in the
on-access case too (not just when pruning is invoked by vacuum).
If you look at the callers of heap_page_prune_opt(), it includes
bitmap heap scan and also heap_prepare_pagescan() which is invoked as
part of sequential scans and other operations.
- Melanie
I wasn't thinking about adding a new VM setting functionality to index
only scan in particular. heapam_index_fetch_tuple() already calls
heap_page_prune_opt() which will do pruning under certain conditions.
I was thinking that we start updating the VM after pruning in the
on-access case too (not just when pruning is invoked by vacuum).
I think this will need to be careful when the workload is constantly updating
and reading the same pages, as we may end up with a continuous cycle of
updates clearing the visibility-map and selects setting the visibility-map.
this may degrade the select workloads, maybe?
--
Sami Imseih
Amazon Web Services (AWS)