Report oldest xmin source when autovacuum cannot remove tuples
Hi hackers,
I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.
The current logs only indicate that dead tuples could not be reclaimed
due to the oldest xmin, but they do not reveal the underlying reason.
To identify the cause, it is necessary to query multiple views:
pg_stat_activity (for active transactions), pg_prepared_xacts (for
prepared statements), pg_replication_slots (for replication slots),
and pg_stat_replication (for hot standby feedback). However, because
the data in these views is volatile, it is difficult to retroactively
determine what was holding the oldest xmin at the specific time the
log message was generated.
This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.
The patch is attached. What do you think?
--
Best regards,
Shinya Kato
NTT OSS Center
Attachments:
v1-0001-Report-oldest-xmin-source-when-autovacuum-cannot-.patchapplication/octet-stream; name=v1-0001-Report-oldest-xmin-source-when-autovacuum-cannot-.patchDownload
From ef0ce0d3fa45d639ae9d08a5550163b08c80c21f Mon Sep 17 00:00:00 2001
From: Shinya Kato <shinya11.kato@gmail.com>
Date: Wed, 29 Oct 2025 21:04:52 +0900
Subject: [PATCH v1] Report oldest xmin source when autovacuum cannot remove
tuples
Autovacuum logging now tells which backend or mechanism kept OldestXmin
from advancing when dead tuples remain, improving diagnosis of bloating
and wraparound hazards. ProcArray now records an OldestXminSource for
each computed horizon, and VACUUM retrieves it through the new
GetOldestNonRemovableTransactionIdWithReason() helper. The log output
names active transactions (with pid), hot standby feedback, prepared
transactions, replication slots, or otherwise labels the cause as
"other".
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/
---
src/backend/access/heap/vacuumlazy.c | 37 ++++
src/backend/commands/vacuum.c | 4 +-
src/backend/storage/ipc/procarray.c | 165 +++++++++++++---
src/include/commands/vacuum.h | 4 +
src/include/storage/procarray.h | 21 ++
src/test/modules/test_misc/meson.build | 1 +
.../t/010_autovacuum_oldest_xmin_reason.pl | 186 ++++++++++++++++++
7 files changed, 388 insertions(+), 30 deletions(-)
create mode 100644 src/test/modules/test_misc/t/010_autovacuum_oldest_xmin_reason.pl
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d2b031fdd06..e4e7c6ee674 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1047,6 +1047,43 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
appendStringInfo(&buf,
_("removable cutoff: %u, which was %d XIDs old when operation ended\n"),
vacrel->cutoffs.OldestXmin, diff);
+ /*
+ * If there are dead tuples not yet removable, report what
+ * determined the OldestXmin horizon to aid diagnostics.
+ */
+ if (vacrel->recently_dead_tuples > 0)
+ {
+ int pid = vacrel->cutoffs.oldest_xmin_info.backend_pid;
+ bool include_pid = pid > 0;
+
+ switch (vacrel->cutoffs.oldest_xmin_info.source)
+ {
+ case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
+ msgfmt = include_pid ?
+ _("oldest xmin source: active transaction (pid=%d)\n") :
+ _("oldest xmin source: active transaction\n");
+ break;
+ case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
+ msgfmt = include_pid ?
+ _("oldest xmin source: hot standby feedback (pid=%d)\n") :
+ _("oldest xmin source: hot standby feedback\n");
+ break;
+ case OLDESTXMIN_SOURCE_PREPARED_TRANSACTION:
+ msgfmt = _("oldest xmin source: prepared transaction\n");
+ break;
+ case OLDESTXMIN_SOURCE_REPLICATION_SLOT:
+ msgfmt = _("oldest xmin source: replication slot\n");
+ break;
+ default:
+ msgfmt = _("oldest xmin source: other\n");
+ break;
+ }
+
+ if (include_pid)
+ appendStringInfo(&buf, msgfmt, pid);
+ else
+ appendStringInfoString(&buf, msgfmt);
+ }
if (frozenxid_updated)
{
diff = (int32) (vacrel->NewRelfrozenXid -
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ed03e3bd50d..1a7ce91f6f6 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1130,7 +1130,9 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams params,
* that only one vacuum process can be working on a particular table at
* any time, and that each vacuum is always an independent transaction.
*/
- cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ cutoffs->OldestXmin =
+ GetOldestNonRemovableTransactionIdWithReason(rel,
+ &cutoffs->oldest_xmin_info);
Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..5fe266234b1 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -241,6 +241,12 @@ typedef struct ComputeXidHorizonsResult
* session's temporary tables.
*/
TransactionId temp_oldest_nonremovable;
+
+ /* Identify what forced each of the horizons above. */
+ OldestXminInfo shared_oldest_nonremovable_info;
+ OldestXminInfo catalog_oldest_nonremovable_info;
+ OldestXminInfo data_oldest_nonremovable_info;
+ OldestXminInfo temp_oldest_nonremovable_info;
} ComputeXidHorizonsResult;
/*
@@ -1622,6 +1628,46 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Store horizon provenance in *info if caller requested it.
+ *
+ * Callers pass NULL when they are not interested in tracking the source.
+ */
+static inline void
+OldestXminInfoSet(OldestXminInfo *info, OldestXminSource source,
+ int backend_pid)
+{
+ if (info == NULL)
+ return;
+
+ info->source = source;
+ info->backend_pid = backend_pid;
+}
+
+/*
+ * Update a tracked OldestXmin horizon with a candidate xid and source.
+ *
+ * If the candidate is older than *target, adopt it and remember why.
+ */
+static inline void
+UpdateOldestXmin(TransactionId *target, OldestXminInfo *info,
+ TransactionId candidate, OldestXminSource source,
+ int backend_pid)
+{
+ TransactionId old;
+ TransactionId new_horizon;
+
+ if (!TransactionIdIsValid(candidate))
+ return;
+
+ old = *target;
+ new_horizon = TransactionIdOlder(old, candidate);
+ if (TransactionIdEquals(new_horizon, old))
+ return;
+
+ *target = new_horizon;
+ OldestXminInfoSet(info, source, backend_pid);
+}
/*
* Determine XID horizons.
@@ -1689,6 +1735,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
/* inferred after ProcArrayLock is released */
h->catalog_oldest_nonremovable = InvalidTransactionId;
+ OldestXminInfoSet(&h->catalog_oldest_nonremovable_info,
+ OLDESTXMIN_SOURCE_OTHER, 0);
LWLockAcquire(ProcArrayLock, LW_SHARED);
@@ -1710,6 +1758,10 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
h->oldest_considered_running = initial;
h->shared_oldest_nonremovable = initial;
h->data_oldest_nonremovable = initial;
+ OldestXminInfoSet(&h->shared_oldest_nonremovable_info,
+ OLDESTXMIN_SOURCE_OTHER, 0);
+ OldestXminInfoSet(&h->data_oldest_nonremovable_info,
+ OLDESTXMIN_SOURCE_OTHER, 0);
/*
* Only modifications made by this backend affect the horizon for
@@ -1724,9 +1776,17 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
* latestCompletedXid.
*/
if (TransactionIdIsValid(MyProc->xid))
+ {
h->temp_oldest_nonremovable = MyProc->xid;
+ OldestXminInfoSet(&h->temp_oldest_nonremovable_info,
+ OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION, MyProcPid);
+ }
else
+ {
h->temp_oldest_nonremovable = initial;
+ OldestXminInfoSet(&h->temp_oldest_nonremovable_info,
+ OLDESTXMIN_SOURCE_OTHER, 0);
+ }
}
/*
@@ -1744,6 +1804,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
int8 statusFlags = ProcGlobal->statusFlags[index];
TransactionId xid;
TransactionId xmin;
+ OldestXminSource candidate_source;
+ int candidate_pid;
/* Fetch xid just once - see GetNewTransactionId */
xid = UINT32_ACCESS_ONCE(other_xids[index]);
@@ -1768,8 +1830,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
* backends are protected even without this check, it can't hurt to
* include them here as well..
*/
- h->oldest_considered_running =
- TransactionIdOlder(h->oldest_considered_running, xmin);
+ UpdateOldestXmin(&h->oldest_considered_running, NULL, xmin,
+ OLDESTXMIN_SOURCE_OTHER, 0);
/*
* Skip over backends either vacuuming (which is ok with rows being
@@ -1780,8 +1842,17 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
continue;
/* shared tables need to take backends in all databases into account */
- h->shared_oldest_nonremovable =
- TransactionIdOlder(h->shared_oldest_nonremovable, xmin);
+ candidate_pid = proc->pid;
+ if (proc->pid == 0)
+ candidate_source = OLDESTXMIN_SOURCE_PREPARED_TRANSACTION;
+ else if (statusFlags & PROC_AFFECTS_ALL_HORIZONS)
+ candidate_source = OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK;
+ else
+ candidate_source = OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION;
+
+ UpdateOldestXmin(&h->shared_oldest_nonremovable,
+ &h->shared_oldest_nonremovable_info,
+ xmin, candidate_source, candidate_pid);
/*
* Normally sessions in other databases are ignored for anything but
@@ -1807,8 +1878,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
(statusFlags & PROC_AFFECTS_ALL_HORIZONS) ||
in_recovery)
{
- h->data_oldest_nonremovable =
- TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+ UpdateOldestXmin(&h->data_oldest_nonremovable,
+ &h->data_oldest_nonremovable_info,
+ xmin, candidate_source, candidate_pid);
}
}
@@ -1827,12 +1899,14 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
if (in_recovery)
{
- h->oldest_considered_running =
- TransactionIdOlder(h->oldest_considered_running, kaxmin);
- h->shared_oldest_nonremovable =
- TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin);
- h->data_oldest_nonremovable =
- TransactionIdOlder(h->data_oldest_nonremovable, kaxmin);
+ UpdateOldestXmin(&h->oldest_considered_running, NULL,
+ kaxmin, OLDESTXMIN_SOURCE_OTHER, 0);
+ UpdateOldestXmin(&h->shared_oldest_nonremovable,
+ &h->shared_oldest_nonremovable_info,
+ kaxmin, OLDESTXMIN_SOURCE_OTHER, 0);
+ UpdateOldestXmin(&h->data_oldest_nonremovable,
+ &h->data_oldest_nonremovable_info,
+ kaxmin, OLDESTXMIN_SOURCE_OTHER, 0);
/* temp relations cannot be accessed in recovery */
}
@@ -1844,10 +1918,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
/*
* Check whether there are replication slots requiring an older xmin.
*/
- h->shared_oldest_nonremovable =
- TransactionIdOlder(h->shared_oldest_nonremovable, h->slot_xmin);
- h->data_oldest_nonremovable =
- TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);
+ UpdateOldestXmin(&h->shared_oldest_nonremovable,
+ &h->shared_oldest_nonremovable_info,
+ h->slot_xmin, OLDESTXMIN_SOURCE_REPLICATION_SLOT, 0);
+ UpdateOldestXmin(&h->data_oldest_nonremovable,
+ &h->data_oldest_nonremovable_info,
+ h->slot_xmin, OLDESTXMIN_SOURCE_REPLICATION_SLOT, 0);
/*
* The only difference between catalog / data horizons is that the slot's
@@ -1857,13 +1933,16 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
* that also can contain catalogs.
*/
h->shared_oldest_nonremovable_raw = h->shared_oldest_nonremovable;
- h->shared_oldest_nonremovable =
- TransactionIdOlder(h->shared_oldest_nonremovable,
- h->slot_catalog_xmin);
+ UpdateOldestXmin(&h->shared_oldest_nonremovable,
+ &h->shared_oldest_nonremovable_info,
+ h->slot_catalog_xmin,
+ OLDESTXMIN_SOURCE_REPLICATION_SLOT, 0);
h->catalog_oldest_nonremovable = h->data_oldest_nonremovable;
- h->catalog_oldest_nonremovable =
- TransactionIdOlder(h->catalog_oldest_nonremovable,
- h->slot_catalog_xmin);
+ h->catalog_oldest_nonremovable_info = h->data_oldest_nonremovable_info;
+ UpdateOldestXmin(&h->catalog_oldest_nonremovable,
+ &h->catalog_oldest_nonremovable_info,
+ h->slot_catalog_xmin,
+ OLDESTXMIN_SOURCE_REPLICATION_SLOT, 0);
/*
* It's possible that slots backed up the horizons further than
@@ -1951,25 +2030,53 @@ GlobalVisHorizonKindForRel(Relation rel)
*/
TransactionId
GetOldestNonRemovableTransactionId(Relation rel)
+{
+ /* Delegate to the WithReason variant to avoid duplicated logic */
+ return GetOldestNonRemovableTransactionIdWithReason(rel, NULL);
+}
+
+/*
+ * Return horizon like GetOldestNonRemovableTransactionId(), and also classify
+ * the reason that determined that horizon at the time of computation.
+ */
+TransactionId
+GetOldestNonRemovableTransactionIdWithReason(Relation rel,
+ OldestXminInfo *out_info)
{
ComputeXidHorizonsResult horizons;
+ TransactionId target = InvalidTransactionId;
+ GlobalVisHorizonKind kind;
+ const OldestXminInfo *source_info = NULL;
+
+ if (out_info != NULL)
+ OldestXminInfoSet(out_info, OLDESTXMIN_SOURCE_OTHER, 0);
ComputeXidHorizons(&horizons);
- switch (GlobalVisHorizonKindForRel(rel))
+ kind = GlobalVisHorizonKindForRel(rel);
+ switch (kind)
{
case VISHORIZON_SHARED:
- return horizons.shared_oldest_nonremovable;
+ target = horizons.shared_oldest_nonremovable;
+ source_info = &horizons.shared_oldest_nonremovable_info;
+ break;
case VISHORIZON_CATALOG:
- return horizons.catalog_oldest_nonremovable;
+ target = horizons.catalog_oldest_nonremovable;
+ source_info = &horizons.catalog_oldest_nonremovable_info;
+ break;
case VISHORIZON_DATA:
- return horizons.data_oldest_nonremovable;
+ target = horizons.data_oldest_nonremovable;
+ source_info = &horizons.data_oldest_nonremovable_info;
+ break;
case VISHORIZON_TEMP:
- return horizons.temp_oldest_nonremovable;
+ target = horizons.temp_oldest_nonremovable;
+ source_info = &horizons.temp_oldest_nonremovable_info;
+ break;
}
- /* just to prevent compiler warnings */
- return InvalidTransactionId;
+ if (out_info != NULL && TransactionIdIsValid(target) && source_info != NULL)
+ *out_info = *source_info;
+ return target;
}
/*
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 1f3290c7fbf..025c22eabd2 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -24,6 +24,7 @@
#include "parser/parse_node.h"
#include "storage/buf.h"
#include "storage/lock.h"
+#include "storage/procarray.h"
#include "utils/relcache.h"
/*
@@ -288,6 +289,9 @@ struct VacuumCutoffs
*/
TransactionId FreezeLimit;
MultiXactId MultiXactCutoff;
+
+ /* What decided OldestXmin at acquisition time */
+ OldestXminInfo oldest_xmin_info;
};
/*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 2f4ae06c279..fb459f9b6c9 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -20,6 +20,25 @@
#include "utils/snapshot.h"
+/*
+ * Identifies what determined a relation's OldestXmin horizon.
+ * Used by autovacuum to report why dead tuples were not removable.
+ */
+typedef enum OldestXminSource
+{
+ OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
+ OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
+ OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
+ OLDESTXMIN_SOURCE_REPLICATION_SLOT,
+ OLDESTXMIN_SOURCE_OTHER
+} OldestXminSource;
+
+typedef struct OldestXminInfo
+{
+ OldestXminSource source;
+ int backend_pid;
+} OldestXminInfo;
+
extern Size ProcArrayShmemSize(void);
extern void ProcArrayShmemInit(void);
extern void ProcArrayAdd(PGPROC *proc);
@@ -54,6 +73,8 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
+extern TransactionId GetOldestNonRemovableTransactionIdWithReason(Relation rel,
+ OldestXminInfo *info);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(bool inCommitOnly,
bool allDbs);
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index f258bf1ccd9..b42fed91184 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -18,6 +18,7 @@ tests += {
't/007_catcache_inval.pl',
't/008_replslot_single_user.pl',
't/009_log_temp_files.pl',
+ 't/010_autovacuum_oldest_xmin_reason.pl',
],
},
}
diff --git a/src/test/modules/test_misc/t/010_autovacuum_oldest_xmin_reason.pl b/src/test/modules/test_misc/t/010_autovacuum_oldest_xmin_reason.pl
new file mode 100644
index 00000000000..5818f143182
--- /dev/null
+++ b/src/test/modules/test_misc/t/010_autovacuum_oldest_xmin_reason.pl
@@ -0,0 +1,186 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+#
+# Validate that autovacuum logs explain why dead tuples could not be removed.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Set up a cluster
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', qq[
+max_prepared_transactions = 5
+autovacuum_naptime = 1s
+autovacuum_vacuum_cost_delay = 0
+log_autovacuum_min_duration = 0
+]);
+$node->start;
+
+# Helper to wait for autovacuum to complete
+sub wait_for_autovacuum
+{
+ my ($relname) = @_;
+ my $vacuumed = $node->poll_query_until(
+ 'postgres', qq[
+ SELECT last_autovacuum IS NOT NULL
+ FROM pg_stat_user_tables
+ WHERE relname = '$relname';
+ ], 't');
+ ok($vacuumed, "autovacuum completed on $relname")
+ or BAIL_OUT("timed out waiting for autovacuum on $relname");
+}
+
+#
+# Active transaction
+#
+my $active_table = 'av_oldestxmin_active';
+$node->safe_psql('postgres', qq[
+CREATE TABLE $active_table(id int)
+ WITH (autovacuum_vacuum_scale_factor = 0,
+ autovacuum_vacuum_threshold = 1);
+INSERT INTO $active_table SELECT generate_series(1, 10);
+]);
+
+my $offset = -s $node->logfile;
+
+my $blocker = $node->background_psql('postgres');
+$blocker->query_safe(qq[
+BEGIN;
+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SELECT * FROM $active_table;
+]);
+
+my $blocker_pid = $blocker->query_safe('SELECT pg_backend_pid();');
+chomp($blocker_pid);
+
+$node->safe_psql('postgres', "DELETE FROM $active_table;");
+
+wait_for_autovacuum($active_table);
+
+ok( $node->wait_for_log(
+ qr/automatic vacuum of table "\Qpostgres.public.$active_table\E":.*oldest xmin source: active transaction \(pid=$blocker_pid\)/s,
+ $offset),
+ 'autovacuum reported active transaction as oldest xmin source');
+
+$blocker->query_safe('ROLLBACK;');
+$blocker->quit;
+$node->safe_psql('postgres', "DROP TABLE $active_table;");
+
+#
+# Prepared transaction
+#
+my $prepared_table = 'av_oldestxmin_prepared';
+$node->safe_psql('postgres', qq[
+CREATE TABLE $prepared_table(id int)
+ WITH (autovacuum_vacuum_scale_factor = 0,
+ autovacuum_vacuum_threshold = 1);
+INSERT INTO $prepared_table SELECT generate_series(1, 10);
+]);
+
+$node->safe_psql('postgres', qq[
+BEGIN;
+PREPARE TRANSACTION 'gx_autovac_xmin';
+]);
+
+$offset = -s $node->logfile;
+
+$node->safe_psql('postgres', "DELETE FROM $prepared_table;");
+
+wait_for_autovacuum($prepared_table);
+
+ok( $node->wait_for_log(
+ qr/automatic vacuum of table "\Qpostgres.public.$prepared_table\E":.*oldest xmin source: prepared transaction/s,
+ $offset),
+ 'autovacuum reported prepared transaction as oldest xmin source');
+
+$node->safe_psql('postgres', "ROLLBACK PREPARED 'gx_autovac_xmin';");
+$node->safe_psql('postgres', "DROP TABLE $prepared_table;");
+
+#
+# Replication slot
+#
+my $slot_table = 'av_oldestxmin_slot';
+$node->safe_psql('postgres', qq[
+CREATE TABLE $slot_table(id int);
+]);
+
+$node->safe_psql('postgres',q[
+SELECT pg_create_logical_replication_slot('autovac_oldestxmin_slot', 'test_decoding');
+]);
+$node->safe_psql('postgres', "DROP TABLE $slot_table;");
+
+my $slotstderr = '';
+$node->psql('postgres',
+ 'VACUUM (VERBOSE) pg_catalog.pg_class;',
+ stderr => \$slotstderr
+);
+like(
+ $slotstderr,
+ qr/oldest xmin source: replication slot/,
+ 'VACUUM VERBOSE reported replication slot as oldest xmin source');
+
+$node->safe_psql('postgres', q[
+SELECT pg_drop_replication_slot('autovac_oldestxmin_slot');
+]);
+
+#
+# Hot standby feedback
+#
+my $hs_table = 'av_oldestxmin_hotstandby';
+$node->safe_psql('postgres', qq[
+CREATE TABLE $hs_table(id int)
+ WITH (autovacuum_vacuum_scale_factor = 0,
+ autovacuum_vacuum_threshold = 1);
+INSERT INTO $hs_table SELECT generate_series(1, 10);
+]);
+
+# Set up a hot standby
+$node->backup('oldestxmin_hotstandby_bkp');
+my $standby = PostgreSQL::Test::Cluster->new('oldestxmin_standby');
+$standby->init_from_backup($node, 'oldestxmin_hotstandby_bkp',
+ has_streaming => 1);
+$standby->append_conf('postgresql.conf', qq[
+hot_standby_feedback = on
+wal_receiver_status_interval = 1s
+]);
+$standby->start;
+$node->wait_for_catchup($standby, 'replay', $node->lsn('flush'));
+
+my $standby_reader = $standby->background_psql('postgres');
+$standby_reader->query_safe('BEGIN;');
+
+$node->poll_query_until('postgres', q[
+SELECT bool_or(backend_xmin IS NOT NULL)
+FROM pg_stat_replication;
+], 't');
+
+my $hs_pid = $node->safe_psql('postgres', q[
+SELECT pid FROM pg_stat_replication
+WHERE backend_xmin IS NOT NULL
+ORDER BY pid
+LIMIT 1;
+]);
+chomp($hs_pid);
+
+$offset = -s $node->logfile;
+
+$node->safe_psql('postgres', "DELETE FROM $hs_table;");
+
+wait_for_autovacuum($hs_table);
+
+ok( $node->wait_for_log(
+ qr/oldest xmin source: hot standby feedback \(pid=$hs_pid\)/,
+ $offset),
+ 'autovacuum reported hot standby feedback as oldest xmin source');
+
+$standby_reader->query_safe('ROLLBACK;');
+$standby_reader->quit;
+$node->safe_psql('postgres', "DROP TABLE $hs_table;");
+
+$standby->stop;
+$node->stop;
+done_testing();
--
2.47.3
HI
Thank you for your path ,This path is extremely helpful.
+/* + * Identifies what determined a relation's OldestXmin horizon. + * Used by autovacuum to report why dead tuples were not removable. + */ +typedef enum OldestXminSource +{ + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION, + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK, + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION, + OLDESTXMIN_SOURCE_REPLICATION_SLOT, + OLDESTXMIN_SOURCE_OTHER +} OldestXminSource; + +typedef struct OldestXminInfo +{ + OldestXminSource source; + int backend_pid; +} OldestXminInfo;
I have a question for like this
one session
begin;
select * from table_a
not commit or not closed session
It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?
Thank
On Fri, Oct 31, 2025 at 2:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
Show quoted text
Hi hackers,
I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.The current logs only indicate that dead tuples could not be reclaimed
due to the oldest xmin, but they do not reveal the underlying reason.
To identify the cause, it is necessary to query multiple views:
pg_stat_activity (for active transactions), pg_prepared_xacts (for
prepared statements), pg_replication_slots (for replication slots),
and pg_stat_replication (for hot standby feedback). However, because
the data in these views is volatile, it is difficult to retroactively
determine what was holding the oldest xmin at the specific time the
log message was generated.This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.The patch is attached. What do you think?
--
Best regards,
Shinya Kato
NTT OSS Center
On Fri, Oct 31, 2025 at 5:01 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
HI
Thank you for your path ,This path is extremely helpful.
Thank you!
+/* + * Identifies what determined a relation's OldestXmin horizon. + * Used by autovacuum to report why dead tuples were not removable. + */ +typedef enum OldestXminSource +{ + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION, + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK, + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION, + OLDESTXMIN_SOURCE_REPLICATION_SLOT, + OLDESTXMIN_SOURCE_OTHER +} OldestXminSource; + +typedef struct OldestXminInfo +{ + OldestXminSource source; + int backend_pid; +} OldestXminInfo;I have a question for like this
one session
begin;
select * from table_a
not commit or not closed session
It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?
Exactly. Looking at 010_autovacuum_oldest_xmin_reason.pl should make
it clear which logs are output in which cases. (I just noticed there
seems to be a case where the test fails. I need to fix that.)
--
Best regards,
Shinya Kato
NTT OSS Center
On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
Hi hackers,
I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.
+1
I like this idea. Thanks for working on this!
This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.The patch is attached. What do you think?
According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
passes on some platforms but fails on others (see [1]https://cirrus-ci.com/task/6063548834512896), so it doesn't
appear stable.
When I set up a primary and standby with hot_standby_feedback enabled,
then created an old prepared transaction expected to prevent dead tuples
from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
instead of "prepared transaction" as the oldest xmin source. This isn't a bug
since both xmins are the same in this case. But it may be confusing?
Would it be better to report "prepared transaction" in such cases?
+ case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
+ msgfmt = include_pid ?
+ _("oldest xmin source: active transaction (pid=%d)\n") :
+ _("oldest xmin source: active transaction\n");
+ break;
+ case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
+ msgfmt = include_pid ?
+ _("oldest xmin source: hot standby feedback (pid=%d)\n") :
+ _("oldest xmin source: hot standby feedback\n");
In these two cases, the PID should always be non-zero, so the message
formats without (pid=%d) might not be necessary.
+ /* Identify what forced each of the horizons above. */
+ OldestXminInfo shared_oldest_nonremovable_info;
+ OldestXminInfo catalog_oldest_nonremovable_info;
+ OldestXminInfo data_oldest_nonremovable_info;
+ OldestXminInfo temp_oldest_nonremovable_info;
It might be good to add a comment explaining why we track
sources only for these four oldest xmins, and not for others
like oldest_considered_running.
+ TransactionId old;
+ TransactionId new_horizon;
+
+ if (!TransactionIdIsValid(candidate))
+ return;
The TransactionIdIsValid(candidate) check may be redundant,
since TransactionIdOlder(old, candidate) already performs
the same validation.
- switch (GlobalVisHorizonKindForRel(rel))
+ kind = GlobalVisHorizonKindForRel(rel);
+ switch (kind)
This change doesn't seem necessary.
Regards,
[1]: https://cirrus-ci.com/task/6063548834512896
--
Fujii Masao
Hi
Thank you for your response , In this context, that session state is idle
in transaction .If we could further distinguish between active sessions
that are still running and idle sessions, that would be clearer.
Thanks
On Tue, Nov 4, 2025 at 11:59 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Show quoted text
On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com>
wrote:Hi hackers,
I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.+1
I like this idea. Thanks for working on this!
This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.The patch is attached. What do you think?
According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
passes on some platforms but fails on others (see [1]), so it doesn't
appear stable.When I set up a primary and standby with hot_standby_feedback enabled,
then created an old prepared transaction expected to prevent dead tuples
from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
instead of "prepared transaction" as the oldest xmin source. This isn't a
bug
since both xmins are the same in this case. But it may be confusing?
Would it be better to report "prepared transaction" in such cases?+ case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION: + msgfmt = include_pid ? + _("oldest xmin source: active transaction (pid=%d)\n") : + _("oldest xmin source: active transaction\n"); + break; + case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK: + msgfmt = include_pid ? + _("oldest xmin source: hot standby feedback (pid=%d)\n") : + _("oldest xmin source: hot standby feedback\n");In these two cases, the PID should always be non-zero, so the message
formats without (pid=%d) might not be necessary.+ /* Identify what forced each of the horizons above. */ + OldestXminInfo shared_oldest_nonremovable_info; + OldestXminInfo catalog_oldest_nonremovable_info; + OldestXminInfo data_oldest_nonremovable_info; + OldestXminInfo temp_oldest_nonremovable_info;It might be good to add a comment explaining why we track
sources only for these four oldest xmins, and not for others
like oldest_considered_running.+ TransactionId old; + TransactionId new_horizon; + + if (!TransactionIdIsValid(candidate)) + return;The TransactionIdIsValid(candidate) check may be redundant,
since TransactionIdOlder(old, candidate) already performs
the same validation.- switch (GlobalVisHorizonKindForRel(rel)) + kind = GlobalVisHorizonKindForRel(rel); + switch (kind)This change doesn't seem necessary.
Regards,
[1] https://cirrus-ci.com/task/6063548834512896
--
Fujii Masao
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.
Adding this information to the vacuum logging is useful, but
I can see this information being exposed in a view as well in
the future.
I have a few comments:
A few minor ones:
1/ pid should be declared as "pid_t"
2/ last value of an enum should be have a traling comma
+typedef enum OldestXminSource
+{
+ OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
+ OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
+ OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
+ OLDESTXMIN_SOURCE_REPLICATION_SLOT,
+ OLDESTXMIN_SOURCE_OTHER
+} OldestXminSource;
More importantly:
3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.
4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing BackendXidGetPid(), I think it can.
Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.
Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.
This takes what you are doing in v1 inside ComputeXidHorizons()
into a new routine. I think this is a cleaner approach.
5/ Also, I think we should also include tests for serializable
transactions
What do you think?
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
0001-sketch-of-cutoff-reasons.txttext/plain; charset=US-ASCII; name=0001-sketch-of-cutoff-reasons.txtDownload
From 53915bc1fd06790fc112cb2ac9e4b9caa742cf92 Mon Sep 17 00:00:00 2001
From: Sami Imseih <simseih@amazon.com>
Date: Fri, 14 Nov 2025 18:15:25 -0600
Subject: [PATCH 1/1] sketch of cutoff reasons
---
src/backend/access/heap/vacuumlazy.c | 5 +++
src/backend/storage/ipc/procarray.c | 60 ++++++++++++++++++++++++++++
src/include/storage/procarray.h | 1 +
3 files changed, 66 insertions(+)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index deb9a3dc0d1..df1df6b0733 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -152,6 +152,7 @@
#include "storage/bufmgr.h"
#include "storage/freespace.h"
#include "storage/lmgr.h"
+#include "storage/procarray.h"
#include "storage/read_stream.h"
#include "utils/lsyscache.h"
#include "utils/pg_rusage.h"
@@ -1047,6 +1048,10 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
appendStringInfo(&buf,
_("removable cutoff: %u, which was %d XIDs old when operation ended\n"),
vacrel->cutoffs.OldestXmin, diff);
+
+ if (vacrel->recently_dead_tuples > 0)
+ BackendXidFindCutOffReason(vacrel->cutoffs.OldestXmin);
+
if (frozenxid_updated)
{
diff = (int32) (vacrel->NewRelfrozenXid -
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..45dfe8a9be2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3244,6 +3244,66 @@ BackendXidGetPid(TransactionId xid)
return result;
}
+void
+BackendXidFindCutOffReason(TransactionId xid)
+{
+ ProcArrayStruct *arrayP = procArray;
+ TransactionId *other_xids = ProcGlobal->xids;
+ bool found_reason = false;
+
+ Assert(xid != InvalidTransactionId);
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+
+ elog(NOTICE, ">>>>>> looking up reason for %d", xid);
+
+ for (int index = 0; index < arrayP->numProcs; index++)
+ {
+ int pgprocno = arrayP->pgprocnos[index];
+ PGPROC *proc = &allProcs[pgprocno];
+
+ /* Case 1: xid matches the session's backend XID */
+ if (other_xids[index] == xid)
+ {
+ if (proc->pid == 0)
+ /* with a prepared transaction */
+ elog(NOTICE, ">>>>>> prepared transaction proc->statusFlags %u", proc->statusFlags);
+ else
+ /* or a write transaction */
+ elog(NOTICE, ">>>>>> xid: transaction BackendXidGetPid = %d proc->statusFlags %u", proc->pid, proc->statusFlags);
+
+ found_reason = true;
+ break;
+ }
+
+ /* Case 2: xid matches xmin */
+ if (proc->xmin == xid)
+ {
+ /* or affects horizons, which is due to hot_standby_feedback */
+ if (proc->statusFlags & PROC_AFFECTS_ALL_HORIZONS)
+ {
+ elog(NOTICE, ">>>>>> hot_standby_feedback == pid of walreceiver %d proc->statusFlags %u", proc->pid, proc->statusFlags);
+ found_reason = true;
+ break;
+ }
+
+ /* or a read-only transaction */
+ elog(NOTICE, ">>>>>> xmin: transaction BackendXidGetPid = %d proc->statusFlags = %u", proc->pid, proc->statusFlags);
+ found_reason = true;
+ break;
+ }
+ }
+
+ /*
+ * we failed to find reason, so it's likely a logical replication slot, or
+ * some other reason
+ */
+ if (!found_reason)
+ elog(NOTICE, ">>>>>> other reasons, including logical replication slot");
+
+ LWLockRelease(ProcArrayLock);
+}
+
/*
* IsBackendPid -- is a given pid a running backend
*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 2f4ae06c279..b0cdeedb848 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -71,6 +71,7 @@ extern void ProcNumberGetTransactionIds(int procNumber, TransactionId *xid,
extern PGPROC *BackendPidGetProc(int pid);
extern PGPROC *BackendPidGetProcWithLock(int pid);
extern int BackendXidGetPid(TransactionId xid);
+extern void BackendXidFindCutOffReason(TransactionId xid);
extern bool IsBackendPid(int pid);
extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
--
2.50.1 (Apple Git-155)
Hi Sami
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing BackendXidGetPid(), I think it can.
Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.
Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.
I like your idea , I think we also could consider introducing a GUC
parameter in the future, which would terminate sessions blocking vacuum
operations when the table's age reaches vacuum_failsafe_age.
Thanks
On Sat, Nov 15, 2025 at 8:25 AM Sami Imseih <samimseih@gmail.com> wrote:
Show quoted text
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.Adding this information to the vacuum logging is useful, but
I can see this information being exposed in a view as well in
the future.I have a few comments:
A few minor ones:
1/ pid should be declared as "pid_t"
2/ last value of an enum should be have a traling comma +typedef enum OldestXminSource +{ + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION, + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK, + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION, + OLDESTXMIN_SOURCE_REPLICATION_SLOT, + OLDESTXMIN_SOURCE_OTHER +} OldestXminSource;More importantly:
3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing BackendXidGetPid(), I think it can.Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.This takes what you are doing in v1 inside ComputeXidHorizons()
into a new routine. I think this is a cleaner approach.5/ Also, I think we should also include tests for serializable
transactionsWhat do you think?
--
Sami Imseih
Amazon Web Services (AWS)
On Sat, Nov 15, 2025 at 5:56 AM Sami Imseih <samimseih@gmail.com> wrote:
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.
+1 for the idea. In BackendXidFindCutOffReason() you have directly
reported using NOTICE I believe that is just to show the idea and you
are planning to append this to the main message? Apart from that we
are looping the whole pgprocarray, however it is only done when we are
vacuuming with verbose mode so might not be that bad.
--
Regards,
Dilip Kumar
Google
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.+1 for the idea. In BackendXidFindCutOffReason() you have directly
reported using NOTICE I believe that is just to show the idea and you
are planning to append this to the main message?
Yes, the NOTICE is just for demonstration. Some more work is needed
to integrate the output into the vacuum log.
Apart from that we are looping the whole pgprocarray, however it is
only done when we are vacuuming with verbose mode so might not
be that bad.
The extra procarray loop will occur during VACUUM VERBOSE
or when a vacuum exceeds log_autovacuum_min_duration, 10
minutes by default.
I do think however, we should be more selective when to
do this work. We should only care about emitting this
information in autovacuum logging if the cutoffs->OldestXmid
did not advance between consecutive vacuums. This will
mean we we will need to track the last cutoff value in relation stats
(pg_stat_user_tables), but I think having this value in
stats will be useful on its own actually, as users can use
it to track tables with stalled cutoffs even if they don't
have sufficient autovacuum logging.
What do you think?
--
Sami Imseih
Amazon Web Services (AWS)
Hi,
On 2025-10-31 15:31:40 +0900, Shinya Kato wrote:
The patch is attached. What do you think?
The added tests never seem to pass in CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6188
Greetings,
Andres Freund
Thank you all for the review comments, and sorry for the late reply.
I will address the review comments in order.
On Sat, Nov 15, 2025 at 9:25 AM Sami Imseih <samimseih@gmail.com> wrote:
More importantly:
3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing BackendXidGetPid(), I think it can.Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.
Thanks for the revised proposal! Your approach is clear and makes the
code easier to read. However, I’m hesitant to proceed with this idea
for the following reasons:
- The original proposal extends ComputeXidHorizons(), which is always
calculated, so there is almost no additional overhead.
- Your proposal incurs additional cost. Furthermore, the time lag
between the execution of ComputeXidHorizons() and
BackendXidFindCutOffReason() could lead to inaccurate logging.
- I don't believe it is necessary to distinguish between active
transactions and "idle in transaction." These states can change
rapidly, and as long as we have the PID, we can check the current
status via pg_stat_activity.
- Your comment made me realize that it might be appropriate to expose
the oldest xmin in the pg_stat_{all,user,sys}_tables views, rather
than just logging it. In that case, we would need to calculate the
oldest xmin horizon every time. This might be a topic for a separate
thread, but we could consider adding columns such as:
- pg_stat_{all,user,sys}_tables.last_vacuum_oldest_xmin (xid)
- pg_stat_{all,user,sys}_tables.last_vacuum_oldest_xmin_source (text)
--
Best regards,
Shinya Kato
NTT OSS Center
Thank you all for the review comments, and sorry for the late reply.
I will address the review comments in order.On Sat, Nov 15, 2025 at 9:25 AM Sami Imseih <samimseih@gmail.com> wrote:
More importantly:
3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing BackendXidGetPid(), I think it can.Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.Thanks for the revised proposal! Your approach is clear and makes the
code easier to read.
My approach is focused on correctness rather than simplicity.
The current logic in ComputeXidHorizons can report the wrong PID for
blocking transactions. For example:
1. start a pgbench with 5 clients for some time ( i.e. 5 minutes )
```
pgbench -i -s50
pgbench -c5 -T300
````
2. start a long running transaction that consumes an XID
```
postgres=# begin;
BEGIN
postgres=*# SELECT txid_current(), pg_backend_pid();
txid_current | pg_backend_pid
--------------+----------------
3665231 | 266601
(1 row)
```
3. run a vacuum
```
postgres=# vacuum verbose pgbench_accounts ;
INFO: vacuuming "postgres.public.pgbench_accounts"
INFO: finished vacuuming "postgres.public.pgbench_accounts": index scans: 0
pages: 0 removed, 59833 remain, 59043 scanned (98.68% of total), 0
eagerly scanned
tuples: 0 removed, 1045578 remain, 35425 are dead but not yet removable
removable cutoff: 4301694, which was 35981 XIDs old when operation ended
oldest xmin source: active transaction (pid=267064)
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0
were all-visible)
index scan bypassed: 405 pages from table (0.68% of total) have 1494
dead item identifiers
avg read rate: 1320.425 MB/s, avg write rate: 2.404 MB/s
buffer usage: 67745 hits, 49974 reads, 91 dirtied
WAL usage: 1 records, 0 full page images, 299 bytes, 0 full page image
bytes, 0 buffers full
system usage: CPU: user: 0.22 s, system: 0.06 s, elapsed: 0.29 s
VACUUM
```
VACUUM reports the oldest-XID source PID as 267090 , but the correct
PID is 267064. This happens because the ComputeXidHorizons loop picks
the first PID it encounters with the cutoff XID, even if other backends
have the same XID. There’s no reliable way within that loop to identify
the actual blocking transaction.
```
postgres=# SELECT pid, datname, usename, state, backend_xmin,
backend_xid, substr(query, 1, 20) as query
FROM pg_stat_activity;
pid | datname | usename | state | backend_xmin |
backend_xid | query
--------+----------+----------+---------------------+--------------+-------------+----------------------
267064 | postgres | postgres | active | 4301694 |
4351291 | UPDATE pgbench_branc
267069 | postgres | postgres | active | 4301694 |
4351332 | UPDATE pgbench_telle
267067 | postgres | postgres | active | 4301694 |
4351299 | UPDATE pgbench_branc
267070 | postgres | postgres | active | 4301694 |
4351279 | UPDATE pgbench_branc
267068 | postgres | postgres | active | |
4351325 | UPDATE pgbench_telle
267066 | postgres | postgres | active | 4301694 |
4351327 | UPDATE pgbench_branc
267065 | postgres | postgres | active | 4301694 |
4351292 | UPDATE pgbench_branc
267077 | postgres | postgres | active | 4301694 |
4351303 | UPDATE pgbench_branc
266606 | postgres | postgres | active | 4301694 |
| SELECT pid, datname,
267071 | postgres | postgres | active | |
| BEGIN;
267072 | postgres | postgres | active | 4301694 |
4351300 | UPDATE pgbench_telle
267073 | postgres | postgres | active | 4301694 |
4351258 | UPDATE pgbench_branc
267075 | postgres | postgres | idle | |
| END;
267074 | postgres | postgres | active | 4301694 |
4351319 | UPDATE pgbench_branc
267076 | postgres | postgres | active | |
4351248 | END;
267084 | postgres | postgres | active | 4301694 |
4351330 | UPDATE pgbench_telle
267078 | postgres | postgres | active | 4301694 |
4351260 | UPDATE pgbench_branc
267082 | postgres | postgres | active | |
4351309 | END;
267081 | postgres | postgres | active | |
4351270 | UPDATE pgbench_branc
267083 | postgres | postgres | active | |
4351313 | END;
267080 | postgres | postgres | active | 4301694 |
4351311 | UPDATE pgbench_branc
267079 | postgres | postgres | active | 4301694 |
4351318 | UPDATE pgbench_branc
267086 | postgres | postgres | active | 4301694 |
4351335 | UPDATE pgbench_branc
267085 | postgres | postgres | active | |
| BEGIN;
267090 | postgres | postgres | idle in transaction | |
4301694 | SELECT txid_current( ************
```
- Your proposal incurs additional cost. Furthermore, the time lag
between the execution of ComputeXidHorizons() and
BackendXidFindCutOffReason() could lead to inaccurate logging.
While scanning the proc array adds some overhead, it could be limited
to cases where multiple VACUUMs are stuck on the same cutoff XID, but
we will need to track the last cutoff-xmin to make that possible.
Alternatively, this information might be better exposed in a new system
view, showing the "PID", "XID," and "reason" a transaction is blocking VACUUM.
This approach is more proactive as a DBA can continuously monitor for
blocking reasons and take action before it becomes an issue.
--
Sami Imseih
Amazon Web Services (AWS)