[PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

Started by Andrew Johnson7 months ago6 messages
#1Andrew Johnson
andrewj@metronome.com
1 attachment(s)

Hello hackers,

I'd like to propose adding a new view named "pg_stat_multixact" to
expose multixact member usage. This addresses a major monitoring gap
that ultimately led to a production outage at Metronome [1]https://metronome.com/blog/root-cause-analysis-postgresql-multixact-member-exhaustion-incidents-may-2025.

Problem
Multixact membership exhaustion is an edge case that can cause write
lockouts, but there's no visibility into membership space usage.
Without any direct telemetry from the database, we're essentially
flying blind. It is possible to estimate multixact membership usage
through scanning the filesystem, but there are several drawbacks to
that method that Naga Appani outlined in a previous thread [2]/messages/by-id/CALdSSPi3Gh08NtcCn44uVeUAYGOT74sU6uei_06qUTa5rMK43g@mail.gmail.com.

This complements Peter Geoghegan's recent thread about vacuum failsafe
improvements [3]/messages/by-id/CAH2-WzmLPWJk3gbAxy8dHY+A-Juz_6uGwfe6DkE8B5-dTDvLcw@mail.gmail.com, where Sami Imseih noted "exposing the members
count... will be a good idea as well" [4]/messages/by-id/CAA5RZ0u43s4YbR=0mJ0_k3VGWjchJHhYnCoaZVzeLd3ccZtwhQ@mail.gmail.com.

Solution
- New view (pg_stat_multixact) with the columns "members" (bigint) and
"update_timestamp" (timestamptz).
- Updates member count and timestamp during multixact allocation and
freeze threshold checks.

I've attached a patch that:
- Implements this view using pgstat patterns.
- Includes isolation tests.
- Includes documentation changes to monitoring.sgml.

I have also:
- Tested initdb works
- Ran make check-world with --enable-tap-tests to ensure all tests pass

I'm aiming to get this into the upcoming CommitFest. I would
appreciate your thoughts on this proposal and attached patch.

[1]: https://metronome.com/blog/root-cause-analysis-postgresql-multixact-member-exhaustion-incidents-may-2025
[2]: /messages/by-id/CALdSSPi3Gh08NtcCn44uVeUAYGOT74sU6uei_06qUTa5rMK43g@mail.gmail.com
[3]: /messages/by-id/CAH2-WzmLPWJk3gbAxy8dHY+A-Juz_6uGwfe6DkE8B5-dTDvLcw@mail.gmail.com
[4]: /messages/by-id/CAA5RZ0u43s4YbR=0mJ0_k3VGWjchJHhYnCoaZVzeLd3ccZtwhQ@mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.

Attachments:

v1-0001-Adding-pg_stat_muiltixact-view-to-allow-membershi.patchapplication/octet-stream; name=v1-0001-Adding-pg_stat_muiltixact-view-to-allow-membershi.patchDownload
From e3f8642bcbbf76887dadc7b0b12bf7b2e309c547 Mon Sep 17 00:00:00 2001
From: Andrew Johnson <andrewj@metronome.com>
Date: Tue, 10 Jun 2025 09:34:01 -0500
Subject: [PATCH v1] Adding pg_stat_muiltixact view to allow membership usage
 telemetry

---
 doc/src/sgml/monitoring.sgml                  |  63 ++++++++-
 src/backend/access/transam/multixact.c        |   9 ++
 src/backend/catalog/system_views.sql          |   5 +
 src/backend/utils/activity/Makefile           |   1 +
 src/backend/utils/activity/meson.build        |   1 +
 src/backend/utils/activity/pgstat.c           |  18 +++
 src/backend/utils/activity/pgstat_multixact.c | 133 ++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c           |  15 ++
 src/include/catalog/pg_proc.dat               |   8 ++
 src/include/pgstat.h                          |  19 +++
 src/include/utils/pgstat_internal.h           |  19 ++-
 src/include/utils/pgstat_kind.h               |   3 +-
 .../pg-stat-multixact-member-count.out        |  50 +++++++
 src/test/isolation/isolation_schedule         |   1 +
 .../specs/pg-stat-multixact-member-count.spec |  92 ++++++++++++
 src/test/regress/expected/rules.out           |   2 +
 16 files changed, 435 insertions(+), 4 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_multixact.c
 create mode 100644 src/test/isolation/expected/pg-stat-multixact-member-count.out
 create mode 100644 src/test/isolation/specs/pg-stat-multixact-member-count.spec

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4265a22d4de..5aebe83c275 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -278,8 +278,9 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    shared memory statistics) in the views
    <structname>pg_stat_xact_all_tables</structname>,
    <structname>pg_stat_xact_sys_tables</structname>,
-   <structname>pg_stat_xact_user_tables</structname>, and
-   <structname>pg_stat_xact_user_functions</structname>.  These numbers do not act as
+   <structname>pg_stat_xact_user_tables</structname>,
+   <structname>pg_stat_xact_user_functions</structname> and
+   <structname>pg_stat_multixact</structname>.  These numbers do not act as
    stated above; instead they update continuously throughout the transaction.
   </para>
 
@@ -493,6 +494,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_multixact</structname><indexterm><primary>pg_stat_multixact</primary></indexterm></entry>
+      <entry>One row only, showing statistics about multixact membership consumption. See
+       <link linkend="monitoring-pg-stat-multixact-view">
+       <structname>pg_stat_multixact</structname></link> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -3243,6 +3252,56 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   </para>
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-multixact-view">
+   <title><structname>pg_stat_multixact</structname></title>
+
+   <indexterm>
+     <primary>pg_stat_multixact</primary>
+   </indexterm>
+
+   <para>
+     The <structname>pg_stat_multixact</structname> view will always have
+     a single row, containing data about multixact membership consumption
+     of the cluster.
+   </para>
+
+   <table id="pg-stat-multixact-view" xreflabel="pg_stat_multixact">
+     <title><structname>pg_stat_multixact</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>update_timestamp</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which the statistic was last updated.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>members</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of multixact members in use.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+   </table>
+ </sect2>
+
  <sect2 id="monitoring-pg-stat-wal-view">
    <title><structname>pg_stat_wal</structname></title>
 
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 3c06ac45532..524306dfbc9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1262,6 +1262,12 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 
 	MultiXactState->nextOffset += nmembers;
 
+	/*
+	 * Record multixact membership space telemetry while we have the lock. We
+	 * do not use the saved variables above because they are stale.
+	 */
+	pgstat_update_multixact_stats(MultiXactState->nextOffset - MultiXactState->oldestOffset);
+
 	LWLockRelease(MultiXactGenLock);
 
 	debug_elog4(DEBUG2, "GetNew: returning %u offset %u", result, *offset);
@@ -2987,6 +2993,9 @@ MultiXactMemberFreezeThreshold(void)
 	if (!ReadMultiXactCounts(&multixacts, &members))
 		return 0;
 
+	/* Record the number of multixact members. */
+	pgstat_update_multixact_stats(members);
+
 	/* If member space utilization is low, no special action is required. */
 	if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
 		return autovacuum_multixact_freeze_max_age;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 08f780a2e63..1cc1df4aa10 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1201,6 +1201,11 @@ CREATE VIEW pg_stat_wal AS
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
+CREATE VIEW pg_stat_multixact AS
+    SELECT
+        pg_stat_get_multixact_update_timestamp() AS update_timestamp,
+        pg_stat_get_multixact_members() AS members;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index 9c2443e1ecd..5dba48a0169 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -26,6 +26,7 @@ OBJS = \
 	pgstat_database.o \
 	pgstat_function.o \
 	pgstat_io.o \
+	pgstat_multixact.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index d8e56b49c24..e5a47d8ef05 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -11,6 +11,7 @@ backend_sources += files(
   'pgstat_database.c',
   'pgstat_function.c',
   'pgstat_io.c',
+  'pgstat_multixact.c',
   'pgstat_relation.c',
   'pgstat_replslot.c',
   'pgstat_shmem.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 8b57845e870..40a80a82f5a 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -479,6 +479,24 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
 	},
+
+	[PGSTAT_KIND_MULTIXACT] = {
+		.name = "multixact",
+
+		.fixed_amount = true,
+		.write_to_file = true,
+
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, multixact),
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, multixact),
+		.shared_data_off = offsetof(PgStatShared_MultiXact, stats),
+		.shared_data_len = sizeof(((PgStatShared_MultiXact *) 0)->stats),
+
+		.init_shmem_cb = pgstat_multixact_init_shmem_cb,
+		.flush_static_cb = pgstat_flush_multixact_cb,
+		.have_static_pending_cb = pgstat_multixact_have_pending_cb,
+		.reset_all_cb = pgstat_multixact_reset_all_cb,
+		.snapshot_cb = pgstat_multixact_snapshot_cb,
+	},
 };
 
 /*
diff --git a/src/backend/utils/activity/pgstat_multixact.c b/src/backend/utils/activity/pgstat_multixact.c
new file mode 100644
index 00000000000..cc44ea6aad8
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_multixact.c
@@ -0,0 +1,133 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_multixact.c
+ *	  Implementation of multixact statistics.
+ *
+ * This file contains the implementation of multixact statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of
+ * statistics.
+ *
+ * Copyright (c) 2001-2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_multixact.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "utils/pgstat_internal.h"
+#include "utils/timestamp.h"
+
+static bool have_multixact_stats = false;
+
+/*
+ * pgstat_fetch_stat_multixact()
+ *
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * a pointer to the multixact statistics struct.
+ */
+PgStat_MultiXactStats *
+pgstat_fetch_stat_multixact(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_MULTIXACT);
+
+	return &pgStatLocal.snapshot.multixact;
+}
+
+bool
+pgstat_multixact_have_pending_cb(void)
+{
+	return have_multixact_stats;
+}
+
+void
+pgstat_update_multixact_stats(uint32 nmembers)
+{
+	PgStat_MultiXactStats * local_stats;
+	TimestampTz now;
+
+	now = GetCurrentTimestamp();
+	local_stats = &pgStatLocal.snapshot.multixact;
+	local_stats->num_members_used = nmembers;
+	local_stats->stat_update_timestamp = now;
+	have_multixact_stats = true;
+}
+
+/*
+ * Report multixact statistics. If nowait is true, then this function
+ * will return if the lock is currently being held.
+ *
+ * This function returns true if the lock could not be acquired. Otherwise, false.
+ */
+bool
+pgstat_flush_multixact_cb(bool nowait)
+{
+	PgStatShared_MultiXact	*stats_shmem;
+	LWLock					*shared_lock;
+
+	stats_shmem = &pgStatLocal.shmem->multixact;
+	shared_lock = &pgStatLocal.shmem->multixact.lock;
+
+	// Since this statistic is a gauge, we have to check timestamps
+	// of the shared statistic and the local statistic. If ours is
+	// larger, then we can overwrite the shared statistic.
+	if (!nowait)
+		LWLockAcquire(shared_lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(shared_lock, LW_EXCLUSIVE))
+		return true;
+
+	if (pgStatLocal.snapshot.multixact.stat_update_timestamp <= stats_shmem->stats.stat_update_timestamp)
+	{
+		// Return if our stats are <= the latest update.
+		LWLockRelease(shared_lock);
+		have_multixact_stats = false;
+		return false;
+	}
+
+	// Update multixact member usage and latest timestamp.
+	stats_shmem->stats.num_members_used = pgStatLocal.snapshot.multixact.num_members_used;
+	stats_shmem->stats.stat_update_timestamp = pgStatLocal.snapshot.multixact.stat_update_timestamp;
+	have_multixact_stats = false;
+
+	LWLockRelease(shared_lock);
+	return false;
+}
+
+void
+pgstat_multixact_init_shmem_cb(void *stats)
+{
+	PgStatShared_MultiXact *stats_shmem;
+
+	stats_shmem = (PgStatShared_MultiXact *) stats;
+	LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA);
+}
+
+void
+pgstat_multixact_snapshot_cb(void)
+{
+	PgStat_MultiXactStats *local_snapshot;
+	PgStatShared_MultiXact *stats_shmem;
+
+	local_snapshot = &pgStatLocal.snapshot.multixact;
+	stats_shmem = &pgStatLocal.shmem->multixact;
+
+	LWLockAcquire(&stats_shmem->lock, LW_SHARED);
+	if (local_snapshot->stat_update_timestamp < stats_shmem->stats.stat_update_timestamp)
+	{
+		local_snapshot->stat_update_timestamp = stats_shmem->stats.stat_update_timestamp;
+		local_snapshot->num_members_used = stats_shmem->stats.num_members_used;
+	}
+	LWLockRelease(&stats_shmem->lock);
+}
+
+void
+pgstat_multixact_reset_all_cb(TimestampTz ts)
+{
+	PgStatShared_MultiXact *stats_shmem;
+
+	stats_shmem = &pgStatLocal.shmem->multixact;
+	LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	memset(&stats_shmem->stats, 0, sizeof(stats_shmem->stats));
+	LWLockRelease(&stats_shmem->lock);
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1c12ddbae49..dc310a588c7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1304,6 +1304,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+Datum
+pg_stat_get_multixact_members(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_multixact()->num_members_used);
+}
+
+Datum
+pg_stat_get_multixact_update_timestamp(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_multixact()->stat_update_timestamp);
+}
+
 /*
 * When adding a new column to the pg_stat_io view and the
 * pg_stat_get_backend_io() function, add a new enum value here above
@@ -1883,6 +1895,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		XLogPrefetchResetStats();
 		pgstat_reset_of_kind(PGSTAT_KIND_SLRU);
 		pgstat_reset_of_kind(PGSTAT_KIND_WAL);
+		pgstat_reset_of_kind(PGSTAT_KIND_MULTIXACT);
 
 		PG_RETURN_VOID();
 	}
@@ -1903,6 +1916,8 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_SLRU);
 	else if (strcmp(target, "wal") == 0)
 		pgstat_reset_of_kind(PGSTAT_KIND_WAL);
+	else if (strcmp(target, "multixact") == 0)
+		pgstat_reset_of_kind(PGSTAT_KIND_MULTIXACT);
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d3d28a263fa..216420a9788 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5991,6 +5991,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '9999', descr => 'statistics: number of multixact members in use',
+  proname => 'pg_stat_get_multixact_members', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_multixact_members' },
+{ oid => '9998', descr => 'statistics: timestamp of the last time the multixact members count was updated',
+  proname => 'pg_stat_get_multixact_update_timestamp', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_multixact_update_timestamp' },
+
 { oid => '6214', descr => 'statistics: per backend type IO statistics',
   proname => 'pg_stat_get_io', prorows => '30', proretset => 't',
   provolatile => 'v', proparallel => 'r', prorettype => 'record',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 378f2f2c2ba..c635fa9f840 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -265,6 +265,19 @@ typedef struct PgStat_CheckpointerStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_CheckpointerStats;
 
+/* --------
+ * PgStat_MultiXactStats		MultiXact stats
+ *
+ * This struct should contain only actual event counters, because we make use
+ * of pg_memory_is_all_zeros() to detect whether there are any stats updates
+ * to apply.
+ * ---------
+ */
+typedef struct PgStat_MultiXactStats
+{
+	PgStat_Counter num_members_used;
+	TimestampTz stat_update_timestamp;
+} PgStat_MultiXactStats;
 
 /*
  * Types related to counting IO operations
@@ -579,6 +592,12 @@ extern void pgstat_report_checkpointer(void);
 extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
 
 
+/*
+ * Functions in pgstat_multixact.c
+ */
+extern PgStat_MultiXactStats *pgstat_fetch_stat_multixact(void);
+extern void pgstat_update_multixact_stats(uint32 nmembers);
+
 /*
  * Functions in pgstat_io.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index d5557e6e998..2e0edd6e4c9 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -417,7 +417,12 @@ typedef struct PgStatShared_Wal
 	PgStat_WalStats stats;
 } PgStatShared_Wal;
 
-
+typedef struct PgStatShared_MultiXact
+{
+	/* lock protects ->stats */
+	LWLock		lock;
+	PgStat_MultiXactStats stats;
+} PgStatShared_MultiXact;
 
 /* ----------
  * Types and definitions for different kinds of variable-amount stats.
@@ -501,6 +506,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
+	PgStatShared_MultiXact multixact;
 
 	/*
 	 * Custom stats data with fixed-numbered objects, indexed by (PgStat_Kind
@@ -535,6 +541,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_WalStats wal;
 
+	PgStat_MultiXactStats multixact;
+
 	/*
 	 * Data in snapshot for custom fixed-numbered statistics, indexed by
 	 * (PgStat_Kind - PGSTAT_KIND_CUSTOM_MIN).  Each entry is allocated in
@@ -757,6 +765,15 @@ extern void pgstat_wal_reset_all_cb(TimestampTz ts);
 extern void pgstat_wal_snapshot_cb(void);
 
 
+/*
+ * Functions in pgstat_multixact.c
+ */
+extern void pgstat_multixact_init_shmem_cb(void *stats);
+extern bool pgstat_multixact_have_pending_cb(void);
+extern void pgstat_multixact_snapshot_cb(void);
+extern void pgstat_multixact_reset_all_cb(TimestampTz ts);
+extern bool pgstat_flush_multixact_cb(bool nowait);
+
 /*
  * Functions in pgstat_subscription.c
  */
diff --git a/src/include/utils/pgstat_kind.h b/src/include/utils/pgstat_kind.h
index f44169fd5a3..2a99187c104 100644
--- a/src/include/utils/pgstat_kind.h
+++ b/src/include/utils/pgstat_kind.h
@@ -38,9 +38,10 @@
 #define PGSTAT_KIND_IO	10
 #define PGSTAT_KIND_SLRU	11
 #define PGSTAT_KIND_WAL	12
+#define PGSTAT_KIND_MULTIXACT	13
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
-#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
+#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_MULTIXACT
 #define PGSTAT_KIND_BUILTIN_SIZE (PGSTAT_KIND_BUILTIN_MAX + 1)
 
 /* Custom stats kinds */
diff --git a/src/test/isolation/expected/pg-stat-multixact-member-count.out b/src/test/isolation/expected/pg-stat-multixact-member-count.out
new file mode 100644
index 00000000000..6151e2dc896
--- /dev/null
+++ b/src/test/isolation/expected/pg-stat-multixact-member-count.out
@@ -0,0 +1,50 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1_stat s1_lock s2_lock s2_stat s1_commit s2_commit s3_state s3_commit
+step s1_stat: 
+	INSERT INTO pg_stat_multixact_count_state VALUES (1, (SELECT members FROM pg_stat_multixact));
+
+step s1_lock: 
+	SELECT * FROM pg_stat_multixact_count_check FOR SHARE;
+
+value
+-----
+    1
+(1 row)
+
+step s2_lock: 
+	SELECT * FROM pg_stat_multixact_count_check FOR SHARE;
+
+value
+-----
+    1
+(1 row)
+
+step s2_stat: 
+	INSERT INTO pg_stat_multixact_count_state VALUES (2, (SELECT members FROM pg_stat_multixact));
+
+step s1_commit: 
+	COMMIT;
+
+step s2_commit: 
+	COMMIT;
+
+step s3_state: 
+	WITH session_1_count AS (
+		SELECT member_count FROM pg_stat_multixact_count_state WHERE session = 1
+	),
+	session_2_count AS (
+		SELECT member_count FROM pg_stat_multixact_count_state WHERE session = 2
+	)
+	SELECT s2.member_count - s1.member_count AS diff
+	FROM session_1_count s1
+	CROSS JOIN session_2_count s2;
+
+diff
+----
+   2
+(1 row)
+
+step s3_commit: 
+	COMMIT;
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e3c669a29c7..8441c320e78 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -116,3 +116,4 @@ test: serializable-parallel-2
 test: serializable-parallel-3
 test: matview-write-skew
 test: lock-nowait
+test: pg-stat-multixact-member-count
diff --git a/src/test/isolation/specs/pg-stat-multixact-member-count.spec b/src/test/isolation/specs/pg-stat-multixact-member-count.spec
new file mode 100644
index 00000000000..3445a8b5a98
--- /dev/null
+++ b/src/test/isolation/specs/pg-stat-multixact-member-count.spec
@@ -0,0 +1,92 @@
+# Ensure that the view pg_stat_multixact accurately reflects the
+# number of multixact members currently in use.
+setup
+{
+  CREATE TABLE pg_stat_multixact_count_check (
+	value int
+  );
+
+  CREATE TABLE pg_stat_multixact_count_state (
+	session int,
+	member_count int
+  );
+
+  INSERT INTO pg_stat_multixact_count_check VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pg_stat_multixact_count_check;
+  DROP TABLE pg_stat_multixact_count_state;
+}
+
+session s1
+
+setup
+{
+	BEGIN;
+}
+
+step s1_stat
+{
+	INSERT INTO pg_stat_multixact_count_state VALUES (1, (SELECT members FROM pg_stat_multixact));
+}
+
+step s1_lock
+{
+	SELECT * FROM pg_stat_multixact_count_check FOR SHARE;
+}
+
+step s1_commit
+{
+	COMMIT;
+}
+
+session s2
+
+setup
+{
+	BEGIN;
+}
+
+step s2_lock
+{
+	SELECT * FROM pg_stat_multixact_count_check FOR SHARE;
+}
+
+step s2_stat
+{
+	INSERT INTO pg_stat_multixact_count_state VALUES (2, (SELECT members FROM pg_stat_multixact));
+}
+
+step s2_commit
+{
+	COMMIT;
+}
+
+session s3
+
+setup
+{
+	BEGIN;
+}
+
+step s3_state
+{
+	WITH session_1_count AS (
+		SELECT member_count FROM pg_stat_multixact_count_state WHERE session = 1
+	),
+	session_2_count AS (
+		SELECT member_count FROM pg_stat_multixact_count_state WHERE session = 2
+	)
+	SELECT s2.member_count - s1.member_count AS diff
+	FROM session_1_count s1
+	CROSS JOIN session_2_count s2;
+}
+
+step s3_commit
+{
+	COMMIT;
+}
+
+permutation s1_stat s1_lock s2_lock s2_stat s1_commit s2_commit s3_state s3_commit
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6cf828ca8d0..0363bc58548 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1934,6 +1934,8 @@ pg_stat_io| SELECT backend_type,
     fsync_time,
     stats_reset
    FROM pg_stat_get_io() b(backend_type, object, context, reads, read_bytes, read_time, writes, write_bytes, write_time, writebacks, writeback_time, extends, extend_bytes, extend_time, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
+pg_stat_multixact| SELECT pg_stat_get_multixact_update_timestamp() AS update_timestamp,
+    pg_stat_get_multixact_members() AS members;
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.39.5 (Apple Git-154)

#2Naga Appani
nagnrik@gmail.com
In reply to: Andrew Johnson (#1)
Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

On Tue, Jun 10, 2025 at 9:40 AM Andrew Johnson <andrewj@metronome.com> wrote:

Hello hackers,

I'd like to propose adding a new view named "pg_stat_multixact" to
expose multixact member usage. This addresses a major monitoring gap
that ultimately led to a production outage at Metronome [1].

Problem
Multixact membership exhaustion is an edge case that can cause write
lockouts, but there's no visibility into membership space usage.
Without any direct telemetry from the database, we're essentially
flying blind. It is possible to estimate multixact membership usage
through scanning the filesystem, but there are several drawbacks to
that method that Naga Appani outlined in a previous thread [2].

This complements Peter Geoghegan's recent thread about vacuum failsafe
improvements [3], where Sami Imseih noted "exposing the members
count... will be a good idea as well" [4].

Solution
- New view (pg_stat_multixact) with the columns "members" (bigint) and
"update_timestamp" (timestamptz).
- Updates member count and timestamp during multixact allocation and
freeze threshold checks.

I've attached a patch that:
- Implements this view using pgstat patterns.
- Includes isolation tests.
- Includes documentation changes to monitoring.sgml.

Hi Andrew,

Thanks for referencing my earlier proposal and for working to improve
observability around MultiXact usage, it’s great to see more attention
on this area.

After quickly reviewing your patch, I wanted to share a few thoughts
on the overall approach.

I shared a patch [0]/messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com that adds a SQL-callable function exposing the
same counters via ReadMultiXactCounts() without complexity. Since
these values are global, not aggregatable per backend or over time,
and not meaningfully resettable, introducing new statistics
infrastructure may be more than what’s needed unless there's an
additional use case I’m overlooking.

A lightweight function seems better aligned with the nature of these
metrics and the operational use cases they serve, particularly for
historical/ongoing diagnostics and periodic monitoring.

[0]: /messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com

Best regards,
Naga Appani

Show quoted text

I have also:
- Tested initdb works
- Ran make check-world with --enable-tap-tests to ensure all tests pass

I'm aiming to get this into the upcoming CommitFest. I would
appreciate your thoughts on this proposal and attached patch.

[1] https://metronome.com/blog/root-cause-analysis-postgresql-multixact-member-exhaustion-incidents-may-2025
[2] /messages/by-id/CALdSSPi3Gh08NtcCn44uVeUAYGOT74sU6uei_06qUTa5rMK43g@mail.gmail.com
[3] /messages/by-id/CAH2-WzmLPWJk3gbAxy8dHY+A-Juz_6uGwfe6DkE8B5-dTDvLcw@mail.gmail.com
[4] /messages/by-id/CAA5RZ0u43s4YbR=0mJ0_k3VGWjchJHhYnCoaZVzeLd3ccZtwhQ@mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.

#3Andrew Johnson
andrewj@metronome.com
In reply to: Naga Appani (#2)
Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

Hi Naga,

Thank you for the thoughtful feedback and for driving attention to
this issue. I appreciate you taking the time to review my patch.

You raise some good points about the trade-offs between a lightweight
function and the pgstat infrastructure. I actually think both
approaches have merit for different use cases, and they could
potentially coexist to serve the community better.

I shared a patch [0] that adds a SQL-callable function exposing the same counters via ReadMultiXactCounts() without complexity...introducing new statistics infrastructure may be more than what's needed unless there's an additional use case I'm overlooking...A lightweight function seems better aligned with the nature of these metrics and the operational use cases they serve, particularly for historical/ongoing diagnostics and periodic monitoring.

I reviewed your patch in depth and I believe the pgstat approach I
took offers some advantages for continuous monitoring scenarios:

1. Performance under monitoring load: Many production environments,
including Metronome's, will poll these statistics frequently for
alerting. Using a direct call to pg_get_multixact_count() ->
ReadMultiXactCounts() acquires LWLocks, which could create significant
contention when multiple monitoring systems are polling frequently. In
high-throughput environments, this could become a bottleneck. The
pgstat view reads from shared memory snapshots without additional lock
acquisition, making it essentially free since we only update the
pgstat structure while we have the lock in the first place.

2. Consistency with existing patterns: PostgreSQL currently uses the
pgstat infrastructure for similar global, clusterwide metrics like
pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
and pg_stat_checkpointer. The multixact member count fits this same
pattern of cluster-wide resource monitoring.

3. Automatic updates: The stats update during natural multixact
operations (allocation, freeze threshold checks), providing current
data without requiring explicit polling of the underlying counters.

Your function approach has clear benefits for ad-hoc diagnostics and
simpler operational queries where call frequency is low. I also note
that your patch tracks both multixacts and members, which provides
valuable additional context.

I've also included isolation tests that verify the view accurately
reflects multixact member allocation, which helps ensure correctness
of the monitoring data.

Given our production experience with multixact membership exhaustion
at Metronome, both approaches would solve the core observability
problem.

I'm happy to keep discussing what the best approach for the community
is. It's great that more light is being shed on this particular issue.

[0]: /messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.

#4Naga Appani
nagnrik@gmail.com
In reply to: Andrew Johnson (#3)
Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

On Tue, Jun 10, 2025 at 11:40 AM Andrew Johnson <andrewj@metronome.com> wrote:

Hi Naga,

Thank you for the thoughtful feedback and for driving attention to
this issue. I appreciate you taking the time to review my patch.

You raise some good points about the trade-offs between a lightweight
function and the pgstat infrastructure. I actually think both
approaches have merit for different use cases, and they could
potentially coexist to serve the community better.

I shared a patch [0] that adds a SQL-callable function exposing the same counters via ReadMultiXactCounts() without complexity...introducing new statistics infrastructure may be more than what's needed unless there's an additional use case I'm overlooking...A lightweight function seems better aligned with the nature of these metrics and the operational use cases they serve, particularly for historical/ongoing diagnostics and periodic monitoring.

I reviewed your patch in depth and I believe the pgstat approach I
took offers some advantages for continuous monitoring scenarios:

1. Performance under monitoring load: Many production environments,
including Metronome's, will poll these statistics frequently for
alerting. Using a direct call to pg_get_multixact_count() ->
ReadMultiXactCounts() acquires LWLocks, which could create significant
contention when multiple monitoring systems are polling frequently. In
high-throughput environments, this could become a bottleneck. The
pgstat view reads from shared memory snapshots without additional lock
acquisition, making it essentially free since we only update the
pgstat structure while we have the lock in the first place.

2. Consistency with existing patterns: PostgreSQL currently uses the
pgstat infrastructure for similar global, clusterwide metrics like
pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
and pg_stat_checkpointer. The multixact member count fits this same
pattern of cluster-wide resource monitoring.

3. Automatic updates: The stats update during natural multixact
operations (allocation, freeze threshold checks), providing current
data without requiring explicit polling of the underlying counters.

Your function approach has clear benefits for ad-hoc diagnostics and
simpler operational queries where call frequency is low. I also note
that your patch tracks both multixacts and members, which provides
valuable additional context.

I've also included isolation tests that verify the view accurately
reflects multixact member allocation, which helps ensure correctness
of the monitoring data.

Given our production experience with multixact membership exhaustion
at Metronome, both approaches would solve the core observability
problem.

I'm happy to keep discussing what the best approach for the community
is. It's great that more light is being shed on this particular issue.

[0] /messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.

Hi Andrew,

Thanks again for the thoughtful discussion and reviewing my patch.

I would like to directly address the concern around LWLock contention
and potential performance overhead from pg_get_multixact_count().

To evaluate this, I ran repeated invocations of both
pg_get_multixact_count() and pg_stat_multixact under sustained system
load. At the time of testing, the system showed a significantly high
load average with notable LWLock contention:

----------------------------------------------------------------------------------
$ uptime
00:23:30 up 2:08, 4 users, load average: 103.18, 117.93, 116.77
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
postgres=# SELECT
a.wait_event_type,
COUNT(*) AS count
FROM pg_stat_activity a
WHERE a.wait_event_type IS NOT NULL
GROUP BY a.wait_event_type
ORDER BY count DESC;
wait_event_type | count
-----------------+-------
LWLock | 1978
Client | 1014
Timeout | 21
Activity | 6
(4 rows)
----------------------------------------------------------------------------------

To simulate realistic monitoring intervals — typically in the 100ms to
1s range — I used pgbench with -R 10 (10 queries per second) over a
60-second duration for both the function and the view. Here are the
full results:

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM
pg_get_multixact_count();") -h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 579
number of failed transactions: 0 (0.000%)
latency average = 0.892 ms
latency stddev = 1.667 ms
rate limit schedule lag: avg 0.160 (max 8.789) ms
initial connection time = 7.106 ms
tps = 9.655158 (without initial connection time)
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM pg_stat_multixact;")
-h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 589
number of failed transactions: 0 (0.000%)
latency average = 0.378 ms
latency stddev = 0.273 ms
rate limit schedule lag: avg 0.137 (max 4.347) ms
initial connection time = 4.516 ms
tps = 9.846161 (without initial connection time)
----------------------------------------------------------------------------------

Even under heavy LWLock contention, both approaches maintained stable
performance. pg_get_multixact_count() executed comfortably under 1 ms
on average, and schedule lag remained low, demonstrating that it can
be safely used in periodic monitoring setups.

I also compared the live counter from the function to the stats
snapshot from the view:

----------------------------------------------------------------------------------
postgres=# SELECT
'from_pg_get_multixact_count' AS source,
members
FROM
pg_get_multixact_count ()
UNION ALL
SELECT
'from_pg_stat_multixact' AS source,
members
FROM
pg_stat_multixact;
source | members
-----------------------------+-----------
from_pg_get_multixact_count | 839438187
from_pg_stat_multixact | 839438011
(2 rows)
----------------------------------------------------------------------------------

While the values are quite close, I think it’s worth highlighting that
pg_get_multixact_count() returns real-time state by calling
ReadMultiXactCounts() at query time, whereas pg_stat_multixact reports
values from the statistics collector’s last sampling cycle. Although
the collector also calls ReadMultiXactCounts() internally, its updates
are asynchronous and may lag — either due to the
stats_fetch_consistency setting [1]https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS, or in environments with high load
or memory pressure where stats updates may be delayed.

This distinction was part of the motivation behind my earlier proposal
[0]: /messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com
ReadMultiXactCounts(). Since these counters are global, not
aggregatable per backend, and don’t reset meaningfully, it seemed
reasonable to expose them without adding new statistics
infrastructure.

I understand the appeal of following the pg_stat_* view pattern for
consistency and passive observability, and I think both approaches
could certainly coexist. The view based design offers a familiar
interface for long-term monitoring, while the function can complement
it in scenarios where timely visibility is helpful — such as
monitoring multixact activity more closely during periods of elevated
usage or investigating potential pressure before it escalates.

While the function and view can coexist, I believe the function
already addresses the majority of practical needs efficiently, without
introducing noticeable contention — even on heavily loaded systems —
and without adding the complexity of statistics infrastructure.

I would be happy to continue the discussion or help refine the direction.

[0]: /messages/by-id/CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com
[1]: https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS

Best regards,
Naga

#5Andrew Johnson
andrewj@metronome.com
In reply to: Naga Appani (#4)
1 attachment(s)
Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

Hi Naga,

Thank you for your thoughtful response and for providing those benchmarks.

I'd like to address your concerns about the potential for statistical
lag with clarifications and empirical data:

its updates are asynchronous and may lag — either due to the stats_fetch_consistency setting

When operators configure `stats_fetch_consistency` to keep statistical
data static during transactions, this is by design - it gives them
explicit control over the consistency/freshness tradeoff. As the
documentation notes, `pg_stat_clear_snapshot()` provides up-to-date
visibility when needed [0]https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS.

[stats] may lag...in environments with high load or memory pressure

This is a valid concern. In order to investigate the extent of
potential statistical lag, I conducted a test to quantify the amount
of lag between `pg_stat_multixact` and `pg_get_multixact_count()`.

=Testing Methodology=
==Build Configuration==
```
configure --enable-cassert=no --enable-debug=no
```

==Test Schema (Multixact-Intensive Workload)==
```
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;

CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
product_id INTEGER REFERENCES products(product_id),
order_date TIMESTAMP DEFAULT NOW(),
amount DECIMAL(10,2)
);

INSERT INTO customers (name) VALUES ('Customer 1'), ('Customer 2');
INSERT INTO products (name) VALUES ('Product 1'), ('Product 2');

INSERT INTO orders (customer_id, product_id, amount)
SELECT
(RANDOM() + 1)::INTEGER,
(RANDOM() + 1)::INTEGER,
RANDOM() * 100
FROM generate_series(1, 100000);

CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_product ON orders(product_id);

ANALYZE;
```

==Load Generation (Terminal 1)==
```
pgbench -n -c 100 -j 100 -T 300 -f =(cat << 'EOF'
BEGIN;
SELECT o.*, c.name, p.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.customer_id = 1
FOR SHARE;
COMMIT;
EOF
) postgres
```

==Lag Measurement (Terminal 2)==
```
psql postgres << 'EOF'
\timing on
DO $$
DECLARE
func_result BIGINT;
view_result BIGINT;
i INTEGER;
diff BIGINT;
timestamp_ms BIGINT;
BEGIN
RAISE NOTICE 'timestamp_ms,sample,function_members,view_members,diff,view_status';

FOR i IN 1..3000 LOOP
timestamp_ms := EXTRACT(EPOCH FROM clock_timestamp()) * 1000;

SELECT members INTO func_result FROM pg_get_multixact_count();

PERFORM pg_stat_clear_snapshot();
SELECT members INTO view_result FROM pg_stat_multixact;

diff := view_result - func_result;

RAISE NOTICE '%,%,%,%,%,%',
timestamp_ms,
i,
func_result,
view_result,
diff,
CASE WHEN diff > 0 THEN 'AHEAD'
WHEN diff < 0 THEN 'BEHIND'
ELSE 'EQUAL' END;

PERFORM pg_sleep(0.1);
END LOOP;
END $$;
EOF
```

==Results==

From 2,744 samples collected during 279 seconds of sustained 100-client load:

Lag Statistics:
- P95 Percentage Difference: 0.95%
- P50 Percentage Difference: 0.12%
- Mean Percentage Difference: 0.25%
- Max Percentage Difference: 1.30%

I've attached the results of my test to this email as a CSV file named
`lag_test_results.csv`.

From the data, the percentage differences between `pg_stat_multixact`
and `pg_get_multixact_count()` generally stay below 1% under heavy
load, maxing out at ~1.3%. The two statistics converge very quickly at
the end of the test when the heavy load is removed.

The data suggests that even under extreme load designed to maximize
multixact allocation, the operational lag remains marginal.

To address your other concerns:

Since these counters are global, not aggregatable per backend

I understand this perspective, though I'd suggest that PostgreSQL
already uses the pgstats pattern for similar global metrics, such as:

- pg_stat_wal
- pg_stat_bgwriter
- pg_stat_checkpointer
- pg_stat_archiver

These all share similar characteristics with multixact member counts:
global scope and used for cluster-wide resource monitoring.

it seemed reasonable to expose them without adding new statistics infrastructure.
adding the complexity of statistics infrastructure.

The pgstats approach follows a well-established design principle:
ensuring that monitoring queries never impact the performance of the
monitored subsystem, provide reasonably up-to-date statistics, and
offer a predictable interface for future developers to extend when
adding related metrics. Given the marginal statistical lag, I believe
it's beneficial to avoid even the theoretical possibility of lock
contention.

and don’t reset meaningfully,

You're right that these metrics don't reset in the traditional sense.
However, this characteristic doesn't necessarily disqualify the
approach given the reasons mentioned above.

I appreciate your feedback and believe both proposals address vital
observability gaps. I'm open to collaborating on a path forward that
meaningfully addresses the problem and serves the broader needs of the
Postgres community.

[0]: https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc

Attachments:

lag_test_results.csvtext/csv; charset=US-ASCII; name=lag_test_results.csvDownload
#6Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Andrew Johnson (#5)
Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

On Sun, Jun 15, 2025 at 6:21 AM Andrew Johnson <andrewj@metronome.com> wrote:

To address your other concerns:

Since these counters are global, not aggregatable per backend

I understand this perspective, though I'd suggest that PostgreSQL
already uses the pgstats pattern for similar global metrics, such as:

- pg_stat_wal
- pg_stat_bgwriter
- pg_stat_checkpointer
- pg_stat_archiver

These all share similar characteristics with multixact member counts:
global scope and used for cluster-wide resource monitoring.

All these are cumulative stats, which need period collection and
accumulation of statistics which doesn't naturally happen in their
respective subsystems. multixact members are already in an
"accumulated" state in MultiXactState. I don't think we need a
separate mechanism just to copy those accumulated stats as your patch
proposes. It's an additional overhead which is not required. However,
there's some advantage of avoiding acquiring MultiXactGenLock for
reporting the multixact stats. However your approach too requires a
lock for updating and reading stats. The stats are updated
periodically, irrespective of whether someone is reading it or not. So
there's some wastage of CPU cycles if the stats is read less
frequently than it's updated. Naga's approach doesn't have this
drawback and also he has shown that the overhead of acquiring
MultiXactGenLock isn't noticeable. In that sense his implementation
looks closer to what we do in pg_stat_get_wal_senders() - which also
feeds another stats view pg_stat_replication.

I am leaning towards Naga's approach, which is simpler. But there
might be others who feel otherwise.

One drawback of Naga's approach is that it doesn't report the space
consumed on the disk, which is more important than the member count.
If you approach allows to do that easily, that will be an added
advantage.

--
Best Wishes,
Ashutosh Bapat