[Proposal] Expose internal MultiXact member count function for efficient monitoring
Hi,
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member
usage. This function provides an accurate, real-time view of MultiXact
activity by directly retrieving the actual member count, rather than
relying on storage-based calculations.
*Current Challenges: *The existing approach we are currently using to
estimate MultiXact member usage has several drawbacks:
- *Filesystem scanning overhead: *These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or millions
of files, and retrieving file sizes using stat() calls, which introduces
significant I/O overhead.
- *Potential performance bottleneck:* On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- *Not a real-time or memory-efficient solution:* The current approach
does not provide a direct, in-memory view of MultiXact activity.
*Proposed Solution*The internal ReadMultiXactCounts() function, implemented
in multixact.c, directly calculates the number of MultiXact members by
reading live state from shared memory. This approach avoids the performance
issues of the current filesystem-based estimation methods.
By exposing ReadMultiXactCounts() for external use, we can provide
PostgreSQL users with an efficient way to monitor MultiXact member usage.
This could be particularly useful for integrating with tools like Amazon
RDS Performance Insights and Amazon CloudWatch to provide enhanced database
insights and proactive managed monitoring for users.
The performance comparison between the current and proposed approaches
shows a significant improvement, with the proposed solution taking only a
fraction of a millisecond to retrieve the MultiXact member count, compared
to tens or hundreds of milliseconds for the current filesystem-based
approach.
Following is the comparison of performance between calculating storage of
MultiXact members directory and retrieving the count of members.
Implementation Used size MultiXact members (approx) Time taken (ms) Time
factor (vs Baseline)
EC2 community (RDS version of pg_ls_multixactdir) 8642 MB 1.8 billion 96.879
1.00
Linux du command 8642 MB 1.8 billion 96 NA
Proposal (ReadMultiXactCounts) N/A 1.99 billion 0.167 580 times faster
I believe exposing ReadMultiXactCounts() would be a valuable addition to
the PostgreSQL ecosystem, providing users with a more reliable and
efficient way to monitor MultiXact usage. Appreciate your feedback or
discussion on this proposal.
Please let me know if this approach is acceptable, so I’ll go ahead and
submit a patch.
Thank you!
Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts() to allow for efficient monitoring of MultiXact
member usage. This function provides an accurate, real-time view of
MultiXact activity by directly retrieving the actual member count, rather
than relying on storage-based calculations.*Current Challenges: *The existing approach we are currently using to
estimate MultiXact member usage has several drawbacks:- *Filesystem scanning overhead: *These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or
millions of files, and retrieving file sizes using stat() calls, which
introduces significant I/O overhead.
- *Potential performance bottleneck:* On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- *Not a real-time or memory-efficient solution:* The current approach
does not provide a direct, in-memory view of MultiXact activity.*Proposed Solution*The internal ReadMultiXactCounts() function,
implemented in multixact.c, directly calculates the number of MultiXact
members by reading live state from shared memory. This approach avoids the
performance issues of the current filesystem-based estimation methods.
................
...............
My apologies for re-posting. This is my first time writing to the hackers
list, and I accidentally used HTML formatting. Below is the original
request in plain text:
**************************************************************************************************************************************************************
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts()[1]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948 to allow for efficient monitoring of MultiXact
member usage. This function provides an accurate, real-time view of
MultiXact activity by directly retrieving the actual member count, rather
than relying on storage-based calculations.
================
Current Challenges
================
The existing approach we are currently using to estimate MultiXact member
usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or millions of
files, and retrieving file sizes using stat() calls, which introduces
significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does
not provide a direct, in-memory view of MultiXact activity.
=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c,
directly calculates the number of MultiXact members by reading live state
from shared memory. This approach avoids the performance issues of the
current filesystem-based estimation methods.
By exposing ReadMultiXactCounts() for external use, we can provide
PostgreSQL users with an efficient way to monitor MultiXact member usage.
This could be particularly useful for integrating with tools like Amazon
RDS Performance Insights and Amazon CloudWatch to provide enhanced database
insights and proactive managed monitoring for users.
=========================
Performance comparison
=========================
The performance comparison between the current and proposed approaches
shows a significant improvement, with the proposed solution taking only a
fraction of a millisecond to retrieve the MultiXact member count, compared
to tens or hundreds of milliseconds for the current filesystem-based
approach. And as more members are generated, the gap widens.
Following is the comparison of performance between calculating storage of
MultiXact members directory and retrieving the count of members.
Implementation | Used size |
MultiXact members
----------------------------------------------------+-------------+------------------
EC2 community (RDS version of pg_ls_multixactdir) | 8642 MB | 1.8
billion
Linux du command | 8642 MB | 1.8
billion
Proposal (ReadMultiXactCounts) | 8642 MB | 1.8
billion
============================================================================================
Sample runs
============================================================================================
Using "du -h"
--------------------
postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G /rdsdbdata/db/17.4/data/pg_multixact/members
real 0m0.285s <============================= time taken
user 0m0.050s <============================= time taken
sys 0m0.140s
Using RDS's pg_ls_multixactdir ():
------------------------------------------------------------
postgres=# SELECT
pg_size_pretty(coalesce(sum(size), 0)) AS members_size
FROM
pg_ls_multixactdir ()
WHERE
name LIKE 'pg_multixact/members%';
members_size
--------------
13 GB
(1 row)
Time: 226.533 ms <============================= time taken
Using proposed function:
----------------------------------------
postgres=# SELECT to_char(pg_get_multixact_members_count(),
'999,999,999,999') AS members_count;
members_count
------------------
2,745,823,171
(1 row)
Time: 0.142 ms <============================= time taken
============================================================================================
I believe exposing ReadMultiXactCounts() would be a valuable addition to
the PostgreSQL ecosystem, providing users with a more reliable and
efficient way to monitor MultiXact usage. Appreciate your feedback or
discussion on this proposal.
Please let me know if this approach is acceptable, so I’ll go ahead and
submit a patch.
Thank you!
References:
[1]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948
https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948
Show quoted text
Thank you!
Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services
On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,
Hi
=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.
This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.
By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.
Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Let's give it a try!
--
Best regards,
Kirill Reshke
On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,
Hi
=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.
Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Let's give it a try!
Hi,
As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.
While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise. The only
workaround today involves scanning the pg_multixact/members directory
on disk, current workaround uses stat() calls over potentially
millions of small segment files, adds I/O overhead, and is unsuitable
for periodic monitoring or integration into observability platforms.
Unlike the approach originally proposed or discussed in this thread,
this patch does not expose the internal ReadMultiXactCounts() function
directly. Instead, it wraps it internally (without changing its
visibility) to make the data available via a new SQL function.
This patch adds:
pg_get_multixact_count()
It returns a composite of:
- multixacts: number of MultiXact IDs that currently exist
- members: number of MultiXact member entries currently exist
Implementation
--------------
- Defined in multixact.c
- Calls ReadMultiXactCounts()
- Returns a composite record (multixacts, members)
- Includes documentation
Use cases
---------
This function enables users to:
- Monitor member usage to anticipate aggressive vacuum and avoid wraparound risk
- Track long-lived workloads that accumulate MultiXacts
- Power lightweight monitoring/diagnostics tools without scanning the filesystem
- Log and analyze MultiXact growth over time
Sample output
-------------
multixacts | members
------------+------------
182371396 | 2826221174
(1 row)
Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.
Implementation | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion |
96.879 | 1.00 (baseline)
Linux (du command) | 8642 MB | 1.8 billion |
96 | 1.00
Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion |
0.167 | ~580x faster
Documentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgml
To keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.
This patch aims to fill a long-standing observability gap.
Patch attached.
Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services
Show quoted text
--
Best regards,
Kirill Reshke
Attachments:
v1-0001-Add-pg_get_multixact_count-function-and-related-d.patchapplication/octet-stream; name=v1-0001-Add-pg_get_multixact_count-function-and-related-d.patchDownload
From 0f9bff594eccf2f7aea288e73f8c147edf884857 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Fri, 6 Jun 2025 05:18:15 +0000
Subject: [PATCH v1] Add pg_get_multixact_count function and related
Add pg_get_multixact_count() SQL function for monitoring MultiXact usage
PostgreSQL exposes mxid_age() to track MultiXact ID wraparound risk,
but there is currently no SQL-accessible way to monitor MultiXact member
consumption, which can independently trigger aggressive vacuuming or
wraparound protection. The only workaround today involves scanning the
pg_multixact/members directory, which is I/O intensive and unsuitable
for monitoring tools.
This patch adds pg_get_multixact_count(), a SQL-callable function that
returns a composite record with two fields:
- multixacts: number of MultiXact IDs that currently exist
- members: number of MultiXact member entries that currently exist
The function calls ReadMultiXactCounts() and returns the values using the
standard record output convention.
Documentation has been added to func.sgml under a new section titled
"MultiXact Information Functions", with a cross-reference in maintenance.sgml
to help users track MultiXact usage relative to autovacuum thresholds.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by:
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func.sgml | 58 ++++++++++++++++++++++++++
doc/src/sgml/maintenance.sgml | 8 ++++
src/backend/access/transam/multixact.c | 30 +++++++++++++
src/include/catalog/pg_proc.dat | 15 +++++++
4 files changed, 111 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c67688cbf5f..bdb64f83c23 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28496,6 +28496,64 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
</sect2>
+ <sect2 id="functions-info-multixact-information">
+ <title>MultiXact Information Functions</title>
+
+ <para>
+ The function shown in <xref linkend="functions-multixact-information"/>
+ exposes internal MultiXact counters used by
+ <productname>PostgreSQL</productname>'s locking and transaction management subsystems.
+ It is primarily intended for monitoring and diagnostic purposes, such as analyzing
+ MultiXact consumption patterns or anticipating wraparound-related maintenance.
+ </para>
+
+ <table id="functions-multixact-information">
+ <title>MultiXact Information Functions</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">Function</para>
+ <para>Description</para>
+ </entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm><primary>pg_get_multixact_count</primary></indexterm>
+ <function>pg_get_multixact_count</function> ()
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns a record with the fields <structfield>multixacts</structfield> and <structfield>members</structfield>:
+ <itemizedlist>
+ <listitem>
+ <para><structfield>multixacts</structfield>: Number of MultiXacts assigned.
+ PostgreSQL initiates aggressive autovacuum when this value grows beyond the threshold
+ defined by <varname>autovacuum_multixact_freeze_max_age</varname>, which is based on
+ the age of <literal>datminmxid</literal>. For more details, see
+ <ulink url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND">
+ Routine Vacuuming: Multixact Wraparound</ulink>.</para>
+ </listitem>
+ <listitem>
+ <para><structfield>members</structfield>: Number of MultiXact member entries created.
+ These are stored in files under the <filename>pg_multixact/members</filename> subdirectory.
+ Wraparound occurs after approximately 4.29 billion entries (~20 GiB). PostgreSQL initiates
+ aggressive autovacuum when the number of members created exceeds approximately 2.145 billion
+ or when storage consumption in <filename>pg_multixact/members</filename> approaches 10 GiB.</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect2>
+
</sect1>
<sect1 id="functions-admin">
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 600e4b3f2f3..a445d1b061c 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -818,6 +818,14 @@ HINT: Execute a database-wide VACUUM in that database.
area can grow up to about 20GB before reaching wraparound.
</para>
+ <para>
+ The <function><link linkend="functions-multixact-information">pg_get_multixact_count</link></function>
+ function provides a way to check how many multixacts and member entries have been allocated. This can
+ be useful for identifying unusual multixact activity, monitoring progress toward wraparound, anticipating
+ system-wide aggressive autovacuum as usage approaches critical thresholds, or verifying whether autovacuum
+ is keeping up with demand.
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 3c06ac45532..ed29746eaa9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3585,3 +3585,33 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
{
return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
}
+
+/*
+ * Returns the current count of multixact members and multixact IDs
+ */
+PG_FUNCTION_INFO_V1(pg_get_multixact_count);
+
+Datum
+pg_get_multixact_count(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[2];
+ bool nulls[2] = {false, false};
+ MultiXactOffset members;
+ uint32 multixacts;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errmsg("return type must be a row type")));
+
+ if (!ReadMultiXactCounts(&multixacts, &members))
+ ereport(ERROR,
+ (errmsg("could not read multixact counts")));
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d3d28a263fa..09115ad1b35 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12556,4 +12556,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Returns current counts of multixact members and multixact IDs
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts',
+ proname => 'pg_get_multixact_count',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8}',
+ proargmodes => '{o,o}',
+ proargnames => '{multixacts,members}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_count'
+},
+
]
--
2.47.1
On Tue, Jun 10, 2025 at 7:50 PM Naga Appani <nagnrik@gmail.com> wrote:
On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi,
Hi
=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.
Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.
Let's give it a try!
Hi,
As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise.
I see mxid_age() will just give mxid consumption but not members
consumption. So just that function is not enough.
Sample output
-------------
multixacts | members
------------+------------
182371396 | 2826221174
(1 row)Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.Implementation | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion |
96.879 | 1.00 (baseline)
Linux (du command) | 8642 MB | 1.8 billion |
96 | 1.00
Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion |
0.167 | ~580x fasterDocumentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgmlTo keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.
In [1]/messages/by-id/aF8b_fp_9Va58vB9@nathan, we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.
+ <sect2 id="functions-info-multixact-information">
+ <title>MultiXact Information Functions</title>
+
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm><primary>pg_get_multixact_count</primary></indexterm>
+ <function>pg_get_multixact_count</function> ()
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns a record with the fields
<structfield>multixacts</structfield> and
<structfield>members</structfield>:
+ <itemizedlist>
+ <listitem>
+ <para><structfield>multixacts</structfield>: Number of
MultiXacts assigned.
+ PostgreSQL initiates aggressive autovacuum when this
value grows beyond the threshold
+ defined by
<varname>autovacuum_multixact_freeze_max_age</varname>, which is based
on
+ the age of <literal>datminmxid</literal>. For more details, see
+ <ulink
url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND">
+ Routine Vacuuming: Multixact Wraparound</ulink>.</para>
+ </listitem>
+ <listitem>
+ <para><structfield>members</structfield>: Number of
MultiXact member entries created.
+ These are stored in files under the
<filename>pg_multixact/members</filename> subdirectory.
+ Wraparound occurs after approximately 4.29 billion
entries (~20 GiB). PostgreSQL initiates
+ aggressive autovacuum when the number of members created
exceeds approximately 2.145 billion
+ or when storage consumption in
<filename>pg_multixact/members</filename> approaches 10 GiB.</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </entry>
The description here doesn't follow the format of the other functions
in this section. We usually explain the inputs and outputs of the
function but not how to use the outputs. In this case, you might want
to just refer to Multixact Wraparound section under Routine Vacuuming
chapter rather than describing the autovacuum behaviour. You can do
that by inserting <xref linkend="vacuum-for-multixact-wraparound"/>
instead of a full URL. These links are appropriately resolved when
creating HTML to version specific links. The URL you have used will
always point to "Current" version.
+ <para>
+ The <function><link
linkend="functions-multixact-information">pg_get_multixact_count</link></function>
+ function provides a way to check how many multixacts and member
entries have been allocated. This can
+ be useful for identifying unusual multixact activity, monitoring
progress toward wraparound, anticipating
+ system-wide aggressive autovacuum as usage approaches critical
thresholds, or verifying whether autovacuum
+ is keeping up with demand.
+ </para>
+
This is the right place to go in details of how the function can be
used; not the function documentation itself. I am yet to make up
whether we need the whole description. I think the first line is
enough and goes well with the rest of the section.
+
+ if (!ReadMultiXactCounts(&multixacts, &members))
+ ereport(ERROR,
+ (errmsg("could not read multixact counts")));
Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.
If ReadMultiXactCounts() returns false,
MultiXactMemberFreezeThreshold() returns 0, which will cause the
autovacuum to be more aggressive. I think it will be good to highlight
that in the function description since that's one of the objectives of
this function: to know when the autovacuum is going to be more
aggressive.
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?
The patch needs tests.
[1]: /messages/by-id/aF8b_fp_9Va58vB9@nathan
--
Best Wishes,
Ashutosh Bapat
On Fri, Jul 25, 2025 at 04:27:37PM +0530, Ashutosh Bapat wrote:
In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.
Yep, let's be consistent.
Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.
Most likely returning NULL is the best thing we can do, as a safe
fallback.
The patch needs tests.
Indeed.
May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c? The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h. The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h. Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.
+PG_FUNCTION_INFO_V1(pg_get_multixact_count);
There should be no need for that, pg_proc.dat handling the
declaration AFAIK.
FWIW, these functions are always kind of hard to use for the end-user
without proper documentation. You may want to add an example of how
one can use it for monitoring in the docs.
--
Michael
On Mon, Jul 28, 2025 at 9:52 AM Michael Paquier <michael@paquier.xyz> wrote:
May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c? The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h. The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h. Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.+PG_FUNCTION_INFO_V1(pg_get_multixact_count);
There should be no need for that, pg_proc.dat handling the
declaration AFAIK.FWIW, these functions are always kind of hard to use for the end-user
without proper documentation. You may want to add an example of how
one can use it for monitoring in the docs.
+1.
Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem, they
would like to know which transactions are holding it back.
pg_get_multixact_members() can be used to get the members of the
oldest multixact if it's reported and then the user can deal with
those transactions. However, the oldest multixact is not reported
anywhere, AFAIK. It's also part of MultiXactState, so can be extracted
via ReadMultiXactCounts(). We could report it through
pg_get_multixact_counts - after renaming it and ReadMultiXactCounts to
pg_get_multixact_stats() and ReadMultiXactStats() respectively. Or we
could write another function to do so. But it comes handy using query
like below
#select oldestmultixact,
pg_get_multixact_members(oldestmultixact::text::xid) from
pg_get_multixact_count();
oldestmultixact | pg_get_multixact_members
------------------+--------------------------
1 | (757,sh)
1 | (768,sh)
(2 rows)
Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.
--
Best Wishes,
Ashutosh Bapat
Attachments:
oldest_multixact.patch.txttext/plain; charset=US-ASCII; name=oldest_multixact.patch.txtDownload
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 6b3f38a2fd6..8b31b57140d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2863,17 +2863,16 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
* exist. Return false if unable to determine.
*/
static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
+ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId)
{
MultiXactOffset nextOffset;
MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
MultiXactId nextMultiXactId;
bool oldestOffsetKnown;
LWLockAcquire(MultiXactGenLock, LW_SHARED);
nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
+ *oldestMultiXactId = MultiXactState->oldestMultiXactId;
nextMultiXactId = MultiXactState->nextMXact;
oldestOffset = MultiXactState->oldestOffset;
oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
@@ -2883,7 +2882,7 @@ ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
return false;
*members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
+ *multixacts = nextMultiXactId - *oldestMultiXactId;
return true;
}
@@ -2922,9 +2921,10 @@ MultiXactMemberFreezeThreshold(void)
uint32 victim_multixacts;
double fraction;
int result;
+ MultiXactId oldestMultiXactId;
/* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
return 0;
/* If member space utilization is low, no special action is required. */
@@ -3503,22 +3503,24 @@ Datum
pg_get_multixact_count(PG_FUNCTION_ARGS)
{
TupleDesc tupdesc;
- Datum values[2];
- bool nulls[2] = {false, false};
+ Datum values[3];
+ bool nulls[3] = {false, false, false};
MultiXactOffset members;
uint32 multixacts;
HeapTuple tuple;
+ MultiXactId oldestMultiXactId;
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errmsg("return type must be a row type")));
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
ereport(ERROR,
(errmsg("could not read multixact counts")));
values[0] = UInt32GetDatum(multixacts);
values[1] = UInt32GetDatum(members);
+ values[2] = UInt32GetDatum(oldestMultiXactId);
tuple = heap_form_tuple(tupdesc, values, nulls);
PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0e511dd3473..2be0cd35e12 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12579,13 +12579,13 @@
# Returns current counts of multixact members and multixact IDs
{
oid => '9001',
- descr => 'get current multixact member and multixact ID counts',
+ descr => 'get current multixact member and multixact ID counts and oldest multixact',
proname => 'pg_get_multixact_count',
prorettype => 'record',
proargtypes => '',
- proallargtypes => '{int4,int8}',
- proargmodes => '{o,o}',
- proargnames => '{multixacts,members}',
+ proallargtypes => '{int4,int8,int4}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{multixacts,members,oldestmultixact}',
provolatile => 'v',
proparallel => 's',
prosrc => 'pg_get_multixact_count'
Hi Ashutosh, Michael,
Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.
On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.
Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.
The description here doesn't follow the format of the other functions
in this section.
Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.
Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible.
The function now returns NULL instead of throwing an error when counts
can’t be read.
If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...
Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.
In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?
Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156 and discussed in [1]/messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).
Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.
The patch needs tests.
Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.
On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem...
Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.
Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.
On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
Yep, let's be consistent.
Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.
Most likely returning NULL is the best thing we can do, as a safe fallback.
Implemented -- the function now returns NULL if counts can’t be read.
The patch needs tests.
Isolation tests have been added as described above.
May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?
I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.
+PG_FUNCTION_INFO_V1(pg_get_multixact_count);
Removed -- now handled entirely by pg_proc.dat.
...You may want to add an example of how one can use it for monitoring in the docs.
I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.
References:
[0]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1]: /messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com
Best regards,
Naga Appani
Attachments:
v2-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchapplication/octet-stream; name=v2-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchDownload
From 180ef463808023fa117595ce554326028724cf43 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Mon, 4 Aug 2025 03:17:28 +0000
Subject: [PATCH v2] Add pg_get_multixact_stats() SQL function for monitoring
MultiXact usage
This patch adds pg_get_multixact_stats(), a SQL-callable function that returns
MultiXact statistics to aid in monitoring wraparound risk and vacuum behavior.
It reports:
multixacts: the number of MultiXact IDs created since the oldest one still needed
members: the number of MultiXact member entries that currently exist
oldest_multixact: the oldest MultiXact ID still needed by any database
The function modifies ReadMultiXactCounts() to expose the oldestMultiXactId and
returns all three values in a composite record. This allows users to monitor
MultiXact usage and identify potential wraparound issues, particularly useful
when combined with pg_get_multixact_members() to investigate specific MultiXacts.
Usage:
SELECT * FROM pg_get_multixact_stats();
Documentation is added to:
- "Transaction ID and Snapshot Information Functions" section in func.sgml
- "Multixacts and Wraparound" section in maintenance.sgml
(routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND)
Isolation tests are added to verify:
- Initial state with zero MultiXacts
- MultiXact creation with overlapping shared locks
- Correct counting of MultiXacts and members
- Proper tracking of oldest MultiXact ID
Author: Naga Appani <nagnifk@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAM2BeoX%2BRasKfG6W8w4qYZZz4BnhyEQMA_y5cEDnKEY_z8o9Czg%40mail.gmail.com
---
doc/src/sgml/func.sgml | 32 ++++++++
doc/src/sgml/maintenance.sgml | 37 ++++++++-
src/backend/access/transam/multixact.c | 79 ++++++++++++++-----
src/include/catalog/pg_proc.dat | 14 ++++
.../isolation/expected/multixact_stats.out | 59 ++++++++++++++
src/test/isolation/specs/multixact_stats.spec | 35 ++++++++
6 files changed, 233 insertions(+), 23 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..0255a51cdad 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27732,6 +27732,38 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm><primary>pg_get_multixact_stats</primary></indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>multixacts</parameter> <type>integer</type>,
+ <parameter>members</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>integer</type> )
+ </para>
+
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest multixact ID still in use.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>
+ </para>
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..d6bd305b0b0 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,41 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing
+ the age of the oldest multixact ID, number of member entries, and the oldest multixact ID still in use, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.
</para>
<para>
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 3cb09c3d598..59e8fc17b7f 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2863,28 +2863,27 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
* exist. Return false if unable to determine.
*/
static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
+ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId)
{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
+ MultiXactOffset nextOffset;
+ MultiXactOffset oldestOffset;
+ MultiXactId nextMultiXactId;
+ bool oldestOffsetKnown;
- LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
- nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
- oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
- LWLockRelease(MultiXactGenLock);
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ nextOffset = MultiXactState->nextOffset;
+ *oldestMultiXactId = MultiXactState->oldestMultiXactId; /* Use the parameter directly */
+ nextMultiXactId = MultiXactState->nextMXact;
+ oldestOffset = MultiXactState->oldestOffset;
+ oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
+ LWLockRelease(MultiXactGenLock);
- if (!oldestOffsetKnown)
- return false;
+ if (!oldestOffsetKnown)
+ return false;
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
- return true;
+ *members = nextOffset - oldestOffset;
+ *multixacts = nextMultiXactId - *oldestMultiXactId; /* Use the parameter */
+ return true;
}
/*
@@ -2922,9 +2921,10 @@ MultiXactMemberFreezeThreshold(void)
uint32 victim_multixacts;
double fraction;
int result;
+ MultiXactId oldestMultiXactId;
/* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
return 0;
/* If member space utilization is low, no special action is required. */
@@ -3493,3 +3493,44 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
{
return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,
+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[3];
+ bool nulls[3] = {false, false, false};
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errmsg("return type must be a row type")));
+
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
+ PG_RETURN_NULL();
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = UInt32GetDatum(oldestMultiXactId);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
+
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ee8fed7e53..756ba39425c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,18 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest multixact',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int4}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{multixacts,members,oldest_multixact}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..54e3238d727
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,59 @@
+Parsed test spec with 3 sessions
+
+starting permutation: stats_init check begin1 lock1 begin2 lock2 check commit1 commit2 check
+step stats_init:
+ CREATE TEMP TABLE stats_before AS
+ SELECT multixacts, members, oldest_multixact FROM pg_get_multixact_stats();
+
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 0| 0| 1
+(1 row)
+
+step begin1: BEGIN;
+step lock1: SELECT * FROM multixact_test WHERE id = 1 FOR SHARE;
+id|val
+--+---
+ 1| 10
+(1 row)
+
+step begin2: BEGIN;
+step lock2: SELECT * FROM multixact_test WHERE id = 1 FOR SHARE;
+id|val
+--+---
+ 1| 10
+(1 row)
+
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)
+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)
+
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..53fcad38c54
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,35 @@
+setup
+{
+ CREATE TABLE multixact_test(id int PRIMARY KEY, val int);
+ INSERT INTO multixact_test VALUES (1, 10);
+}
+
+teardown
+{
+ DROP TABLE multixact_test;
+}
+
+session s1
+step begin1 { BEGIN; }
+step lock1 { SELECT * FROM multixact_test WHERE id = 1 FOR SHARE; }
+step commit1 { COMMIT; }
+
+session s2
+step begin2 { BEGIN; }
+step lock2 { SELECT * FROM multixact_test WHERE id = 1 FOR SHARE; }
+step commit2 { COMMIT; }
+
+session s3
+step stats_init {
+ CREATE TEMP TABLE stats_before AS
+ SELECT multixacts, members, oldest_multixact FROM pg_get_multixact_stats();
+}
+step check {
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+permutation stats_init check begin1 lock1 begin2 lock2 check commit1 commit2 check
--
2.47.3
On Mon, Aug 4, 2025 at 1:16 AM Naga Appani <nagnrik@gmail.com> wrote:
Hi Ashutosh, Michael,
Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.The description here doesn't follow the format of the other functions
in this section.Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible.The function now returns NULL instead of throwing an error when counts
can’t be read.If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...
Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.The patch needs tests.
Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem...
Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:
Yep, let's be consistent.
Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.Most likely returning NULL is the best thing we can do, as a safe fallback.
Implemented -- the function now returns NULL if counts can’t be read.
The patch needs tests.
Isolation tests have been added as described above.
May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.+PG_FUNCTION_INFO_V1(pg_get_multixact_count);
Removed -- now handled entirely by pg_proc.dat.
...You may want to add an example of how one can use it for monitoring in the docs.
I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1] /messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.comBest regards,
Naga Appani
Following up on my v2 from yesterday — the recent commit [0]https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b changed
the directory layout, which broke the patch (v2). This v3 updates the
code to work with the new structure and also fixes some formatting
issues I noticed while revisiting the changes.
The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).
Please find v3 attached.
References:
[0]: https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b
Best regards,
Naga Appani
Attachments:
v3-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchapplication/octet-stream; name=v3-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchDownload
From 5b3341fa9759f131c5e368b6debcb8f208c96a12 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Mon, 4 Aug 2025 21:02:26 +0000
Subject: [PATCH v2] Add pg_get_multixact_stats() SQL function for monitoring
multixact usage
This patch adds pg_get_multixact_stats(), a SQL-callable function that returns
multixact statistics to aid in monitoring wraparound risk and vacuum behavior.
It reports:
multixacts: the number of multixact IDs created since the oldest one still needed
members: the number of multixact member entries that currently exist
oldest_multixact: the oldest multixact ID still needed by any database
The function modifies ReadMultiXactCounts() to expose the oldestMultiXactId and
returns all three values in a composite record. This allows users to monitor
multixact usage and identify potential wraparound issues, particularly useful
when combined with pg_get_multixact_members() to investigate specific multixacts.
Usage:
SELECT * FROM pg_get_multixact_stats();
Documentation is added to:
- "Transaction ID and Snapshot Information Functions" section in func.sgml
- "Multixacts and Wraparound" section in maintenance.sgml
(routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND)
Isolation tests are added to verify:
- Initial state with zero multixacts
- multixact creation with overlapping shared locks
- Correct counting of multixacts and members
- Proper tracking of oldest multixact ID
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAM2BeoX%2BRasKfG6W8w4qYZZz4BnhyEQMA_y5cEDnKEY_z8o9Czg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 32 ++++++++++
doc/src/sgml/maintenance.sgml | 37 ++++++++++--
src/backend/access/transam/multixact.c | 50 ++++++++++++++--
src/include/catalog/pg_proc.dat | 15 +++++
.../isolation/expected/multixact_stats.out | 59 +++++++++++++++++++
src/test/isolation/specs/multixact_stats.spec | 35 +++++++++++
6 files changed, 219 insertions(+), 9 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index b507bfaf64b..75ce2c6e403 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2967,6 +2967,38 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm><primary>pg_get_multixact_stats</primary></indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>multixacts</parameter> <type>integer</type>,
+ <parameter>members</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>integer</type> )
+ </para>
+
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID still in use.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>
+ </para>
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..d6bd305b0b0 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,41 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing
+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.
</para>
<para>
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 3cb09c3d598..ec7ff959416 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2863,17 +2863,16 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
* exist. Return false if unable to determine.
*/
static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
+ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId)
{
MultiXactOffset nextOffset;
MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
MultiXactId nextMultiXactId;
bool oldestOffsetKnown;
LWLockAcquire(MultiXactGenLock, LW_SHARED);
nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
+ *oldestMultiXactId = MultiXactState->oldestMultiXactId;
nextMultiXactId = MultiXactState->nextMXact;
oldestOffset = MultiXactState->oldestOffset;
oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
@@ -2883,7 +2882,7 @@ ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
return false;
*members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
+ *multixacts = nextMultiXactId - *oldestMultiXactId;
return true;
}
@@ -2922,9 +2921,10 @@ MultiXactMemberFreezeThreshold(void)
uint32 victim_multixacts;
double fraction;
int result;
+ MultiXactId oldestMultiXactId;
/* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
return 0;
/* If member space utilization is low, no special action is required. */
@@ -3493,3 +3493,43 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
{
return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,
+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[3];
+ bool nulls[3] = {false, false, false};
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errmsg("return type must be a row type")));
+
+ if (!ReadMultiXactCounts(&multixacts, &members, &oldestMultiXactId))
+ PG_RETURN_NULL();
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = UInt32GetDatum(oldestMultiXactId);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..9d9e28c2770 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# MultiXact stats
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest multixact',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int4}',
+ proargmodes => '{o,o,o}',
+ proargnames => '{multixacts,members,oldest_multixact}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..54e3238d727
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,59 @@
+Parsed test spec with 3 sessions
+
+starting permutation: stats_init check begin1 lock1 begin2 lock2 check commit1 commit2 check
+step stats_init:
+ CREATE TEMP TABLE stats_before AS
+ SELECT multixacts, members, oldest_multixact FROM pg_get_multixact_stats();
+
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 0| 0| 1
+(1 row)
+
+step begin1: BEGIN;
+step lock1: SELECT * FROM multixact_test WHERE id = 1 FOR SHARE;
+id|val
+--+---
+ 1| 10
+(1 row)
+
+step begin2: BEGIN;
+step lock2: SELECT * FROM multixact_test WHERE id = 1 FOR SHARE;
+id|val
+--+---
+ 1| 10
+(1 row)
+
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)
+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)
+
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..53fcad38c54
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,35 @@
+setup
+{
+ CREATE TABLE multixact_test(id int PRIMARY KEY, val int);
+ INSERT INTO multixact_test VALUES (1, 10);
+}
+
+teardown
+{
+ DROP TABLE multixact_test;
+}
+
+session s1
+step begin1 { BEGIN; }
+step lock1 { SELECT * FROM multixact_test WHERE id = 1 FOR SHARE; }
+step commit1 { COMMIT; }
+
+session s2
+step begin2 { BEGIN; }
+step lock2 { SELECT * FROM multixact_test WHERE id = 1 FOR SHARE; }
+step commit2 { COMMIT; }
+
+session s3
+step stats_init {
+ CREATE TEMP TABLE stats_before AS
+ SELECT multixacts, members, oldest_multixact FROM pg_get_multixact_stats();
+}
+step check {
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+permutation stats_init check begin1 lock1 begin2 lock2 check commit1 commit2 check
--
2.47.3
On Mon, Aug 04, 2025 at 04:51:30PM -0500, Naga Appani wrote:
The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).Please find v3 attached.
I am reading again what you have here, and I really think that we
should move the SQL function parts of multixact.c into their own new
file, exposing ReadMultiXactCounts() in multixact.h, because I also
suspect that this can become really useful for extensions that aim at
doing things similar to your proposal in terms of data monitoring for
autovacuum wraparound. This means two refactoring patches:
- One to expose the new routine in multixact.h.
- One to move the existing SQL code to its new file.
ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition. So perhaps this should
be named GetMultiXactInformation() or something similar?
The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call. It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false. What about oldestOffset itself? Should it be returned for
consistency with the counts and oldestMultiXactId?
--
Michael
On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote:
In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).
Would it be better to do that math in the function and output the
result? Users may not be able to read and understand the PostgreSQL
code or pgsql-hackers threads Or the constants may change across
versions. It will be more convenient for users if they get the output
from the function itself.
On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:
ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition. So perhaps this should
be named GetMultiXactInformation() or something similar?
+1
The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call. It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false. What about oldestOffset itself? Should it be returned for
consistency with the counts and oldestMultiXactId?
+1
Some more comments on the patch
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID
still in use.
Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>
This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.
I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum
In case each member starts consuming more or less space than it does
today what would be the basis of triggering workaround? Space or
number of members? I think we should mention only that.
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.
Similar to above.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing
This is the right place to elaborate the usage of this function with an example.
+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.
I think the second point here repeats what's already mentioned
earlier. It will be good to elaborate each point with an example
instead of just narration.
+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,
... since startup or the number of existing members?
+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.
The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.
+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)
Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.
--
Best Wishes,
Ashutosh Bapat
Hi Michael, Ashutosh,
Thanks a lot for the detailed reviews and feedback. Please find
attached v4 of the patchset.
Summary of changes in v4:
- Split into two patches as suggested:
1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in
multixact.h with clearer comments.
2. Add pg_get_multixact_stats() as a SQL-callable function in a new
file (multixactfuncs.c), with docs and tests.
- Function now also returns oldestOffset for consistency.
- Field names updated to num_mxids, num_members, oldest_multixact,
oldest_offset.
- Documentation revised to describe thresholds only in terms of member
counts (disk size wording removed).
- Added a minimal example in maintenance.sgml where multixact
wraparound is already described.
- Isolation tests are rewritten so they no longer depend on exact
counts, but only on monotonic properties guaranteed while a multixact
is pinned.
Replies inline below:
On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
I really think that we should move the SQL function parts of multixact.c
into their own new file, exposing ReadMultiXactCounts() in multixact.h...
Done. The SQL-callable code now lives in
src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
src/include/access/multixact.h.
ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller...
So perhaps this should be named GetMultiXactInformation() or something
similar?
Renamed to GetMultiXactInfo().
The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call.
Added detailed comments about consistency under a single LWLock and the
meaning of each field.
It looks inconsistent to return oldestMultiXactId if the
oldestOffsetKnown is false. What about oldestOffset itself?
GetMultiXactInfo() now returns oldestOffset as well. If the oldest
offset isn’t currently known, the function returns false and clears
all outputs, so callers don’t see a partially valid struct.
---
On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
Would it be better to do that math in the function and output the result?
That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.
I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.
Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.
Adjusted. The SQL function returns: num_mxids, num_members,
oldest_multixact, oldest_offset.
This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.
Earlier I thought it was fine to add an example since
pg_input_error_info() also has one, so in this version I placed the
example in maintenance.sgml, where we already discuss multixact
wraparound. That seemed like the most natural place for it. I agree with
your point about consistency, though, so I kept the style minimal and
aligned with the surrounding text.
I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.
Added a note that values are a live snapshot and can change immediately.
In case each member starts consuming more or less space than it does
today what would be the basis of triggering wraparound? Space or
number of members? I think we should mention only that.
I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.
This is the right place to elaborate the usage of this function with an
example.
Expanded with a short example, while keeping it consistent with nearby
entries.
... since startup or the number of existing members?
Clarified that the values reflect what’s *currently in use* (i.e.,
derived from next/oldest) and that NULLs are returned if the multixact
subsystem has not been initialized yet.
The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.
Dropped those lines.
Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.
Right, the earlier version of the test assumed stable counts, which
could fail if autovacuum or background cleanup removed entries in
between steps. In v4 the isolation test no longer relies on exact
numbers. Instead it asserts only the monotonic properties that are
guaranteed while a multixact is pinned, and avoids assumptions once
locks are released. That makes the test robust against concurrent vacuum
activity.
---
Thanks again for the thoughtful reviews and guidance. Please let me know
if you see further adjustments needed.
Best regards,
Naga
Attachments:
v4-0001-Rename-ReadMultiXactCounts-to-GetMultiXactInfo-an.patchapplication/octet-stream; name=v4-0001-Rename-ReadMultiXactCounts-to-GetMultiXactInfo-an.patchDownload
From 1a85c94f93985d39a6090a35d155f6ee6c788c14 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Sat, 16 Aug 2025 17:51:52 +0000
Subject: [PATCH v4] Rename ReadMultiXactCounts() to GetMultiXactInfo() and
make it public
Following review feedback from Michael Paquier, this patch exposes
GetMultiXactInfo(), a public accessor that returns snapshot of
MultiXact state (counts and horizons) in one call, replacing
ReadMultiXactCounts().
Provide a single snapshot of MultiXact state and return:
- multixacts
- members
- oldestMultiXactId
- oldestOffset
Return false when the oldest offset is not known; in that case set all
outputs to 0/invalid for consistency. Declare the function in
multixact.h and switch MultiXactMemberFreezeThreshold() to the new API.
This accessor underpins pg_get_multixact_stats() and is available to
extensions that wish to monitor MultiXact usage.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
src/backend/access/transam/multixact.c | 45 ++++++++++++++++++--------
src/include/access/multixact.h | 1 +
2 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 3cb09c3d598..eeeec81abc9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2859,31 +2859,46 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
}
/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
+ * GetMultiXactInfo
+ *
+ * Returns information about current MultiXact state in a single atomic read:
+ * - multixacts: Number of MultiXacts (nextMultiXactId - oldestMultiXactId)
+ * - members: Number of member entries (nextOffset - oldestOffset)
+ * - oldestMultiXactId: Oldest MultiXact ID still in use
+ * - oldestOffset: Oldest offset still in use
+ *
+ * Returns false if the oldest offset is not known, in which case all output
+ * parameters are set to 0/invalid values for consistency.
*/
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
+bool
+GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
+ MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset)
{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
+ MultiXactOffset nextOffset;
+ MultiXactId nextMultiXactId;
+ bool oldestOffsetKnown;
+ /* Take one consistent snapshot of the state */
LWLockAcquire(MultiXactGenLock, LW_SHARED);
nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
+ *oldestMultiXactId = MultiXactState->oldestMultiXactId;
nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
+ *oldestOffset = MultiXactState->oldestOffset;
oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
LWLockRelease(MultiXactGenLock);
if (!oldestOffsetKnown)
+ {
+ /* Set all outputs to 0/invalid for consistency */
+ *members = 0;
+ *multixacts = 0;
+ *oldestMultiXactId = InvalidMultiXactId;
+ *oldestOffset = 0;
return false;
+ }
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
+ *members = nextOffset - *oldestOffset;
+ *multixacts = nextMultiXactId - *oldestMultiXactId;
return true;
}
@@ -2922,9 +2937,11 @@ MultiXactMemberFreezeThreshold(void)
uint32 victim_multixacts;
double fraction;
int result;
+ MultiXactId oldestMultiXactId;
+ MultiXactOffset oldestOffset;
/* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
return 0;
/* If member space utilization is low, no special action is required. */
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index b876e98f46e..e0878461c2c 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -158,5 +158,6 @@ extern void multixact_desc(StringInfo buf, XLogReaderState *record);
extern const char *multixact_identify(uint8 info);
extern char *mxid_to_string(MultiXactId multi, int nmembers,
MultiXactMember *members);
+extern bool GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset);
#endif /* MULTIXACT_H */
--
2.47.3
v4-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v4-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From 46c0c3d632a069a70923dd8d378b6c5d92b7eb1b Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Sat, 16 Aug 2025 17:52:06 +0000
Subject: [PATCH v4] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Expose multixact state via a new SQL-callable function
pg_get_multixact_stats(), returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
The function returns NULLs if the MultiXact subsystem is not yet
initialized.
An isolation test (multixact_stats) asserts only invariants that are
stable while a newly created multixact is pinned: (1) adding a second
locker on the same tuple increases members by ≥1; (2) num_mxids and
num_members do not decrease across snapshots; and (3) oldest_* never
decrease. The test prints a deterministic key/value table
("assertion | ok") and makes no assertions after locks are released,
so it remains robust even if background VACUUM/FREEZE runs.
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- new module: src/backend/utils/adt/multixactfuncs.c
- pg_proc.dat entry
- meson.build integration
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
---
doc/src/sgml/func/func-info.sgml | 28 ++++
doc/src/sgml/maintenance.sgml | 54 +++++++-
src/backend/utils/adt/Makefile | 1 +
src/backend/utils/adt/meson.build | 1 +
src/backend/utils/adt/multixactfuncs.c | 62 +++++++++
src/include/catalog/pg_proc.dat | 15 ++
.../isolation/expected/multixact_stats.out | 94 +++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 128 ++++++++++++++++++
9 files changed, 379 insertions(+), 5 deletions(-)
create mode 100644 src/backend/utils/adt/multixactfuncs.c
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..ea063f6a81d 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,34 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member entries created,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+ <para>
+ This is a live snapshot of shared counters; the numbers can change between calls,
+ even within the same transaction.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..e3a63c5b864 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 4 billion entries before reaching wraparound.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in
+ <xref linkend="functions-pg-snapshot"/>, provides a way to monitor
+ multixact allocation and usage patterns in real time. For example:
+ <programlisting>
+postgres=# SELECT * FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | oldest_multixact | oldest_offset
+-----------+-------------+------------------+---------------
+ 99883849 | 773468747 | 39974368 | 351952978
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~100 million
+ multixact IDs and ~773 million member entries have been created since the oldest
+ surviving multixact (ID 39974368). By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in num_mxids might indicate
+ multiple sessions running UPDATE statements with foreign key checks,
+ concurrent SELECT FOR SHARE operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while num_members grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
</para>
<para>
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index ffeacf2b819..cc68ac545a5 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -68,6 +68,7 @@ OBJS = \
misc.o \
multirangetypes.o \
multirangetypes_selfuncs.o \
+ multixactfuncs.o \
name.o \
network.o \
network_gist.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index ed9bbd7b926..dac372c3bea 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -55,6 +55,7 @@ backend_sources += files(
'misc.c',
'multirangetypes.c',
'multirangetypes_selfuncs.c',
+ 'multixactfuncs.c',
'name.c',
'network.c',
'network_gist.c',
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
new file mode 100644
index 00000000000..faf02bd1626
--- /dev/null
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -0,0 +1,62 @@
+/*-------------------------------------------------------------------------
+ * multixactfuncs.c
+ * Functions for reporting on multixact state.
+ *
+ * This module provides SQL-callable functions that expose internal multixact
+ * state information for monitoring usage and detecting potential wraparound
+ * conditions that may require vacuum maintenance.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/utils/adt/multixactfuncs.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/multixact.h"
+#include "funcapi.h"
+#include "utils/builtins.h"
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use
+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[4];
+ bool nulls[4] = {true, true, true, true};
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ values[0] = Int32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = UInt32GetDatum(oldestMultiXactId);
+ values[3] = Int64GetDatum(oldestOffset);
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = false;
+ }
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..837bba938e6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest values',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,xid,int8}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{num_mxids,num_members,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..2893c4d9f36
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,94 @@
+Parsed test spec with 3 sessions
+
+starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+step d_begin: BEGIN; SET client_min_messages = warning;
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step d_commit: COMMIT;
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4411d3c86dd..7da500bf6cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -117,3 +117,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..cbf4b57294e
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,128 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Driver session: keep a transaction open while we take snapshots and check.
+session "driver"
+step d_begin { BEGIN; SET client_min_messages = warning; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row (still in driver xact).
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact).
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+step d_commit { COMMIT; }
+
+permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+
--
2.47.3
On Sun, Aug 17, 2025 at 01:27:29AM -0500, Naga Appani wrote:
On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:
I really think that we should move the SQL function parts of multixact.c
into their own new file, exposing ReadMultiXactCounts() in multixact.h...Done. The SQL-callable code now lives in
src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
src/include/access/multixact.h.
My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h. I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.
ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller...
So perhaps this should be named GetMultiXactInformation() or something
similar?Renamed to GetMultiXactInfo().
+ * Returns information about current MultiXact state in a single atomic read:
This comment is incorrect. This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.
Except for this comment, this looks pretty much OK. Ashutosh, any
comments?
I have not looked at the rest.
--
Michael
On Sun, Aug 17, 2025 at 11:57 AM Naga Appani <nagnrik@gmail.com> wrote:
On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
In case each member starts consuming more or less space than it does
today what would be the basis of triggering wraparound? Space or
number of members? I think we should mention only that.I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.
The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.
That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.
The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.
If size is being used as a threshold, reporting count is useless
because user wouldn't know the relation easily. If count is used as a
threshold, reporting count makes sense.
--
Best Wishes,
Ashutosh Bapat
Hi Michael, Ashutosh,
Thanks a lot for taking the time to review this patch and share your thoughts.
Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.
Please see my in-line replies below.
---
On Mon, Aug 18, 2025 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:
My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h. I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.
Really appreciate your clarification and for making that change. I
misunderstood your earlier point.
+ * Returns information about current MultiXact state in a single atomic read:
This comment is incorrect. This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.
Fixed and adjusted wording.
---
On Mon, Aug 18, 2025 at 6:56 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.
The decision is made in MultiXactMemberFreezeThreshold() [0]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916, and it
is entirely count-based:
if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
return autovacuum_multixact_freeze_max_age;
fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
(MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);
MaxMultiXactOffset is defined in multixact.h [1]https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31:
#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)
Thresholds are defined in multixact.c [2]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218
#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
(MaxMultiXactOffset - MaxMultiXactOffset / 4)
These translate to:
- MaxMultiXactOffset: ~4.29 billion (2^32 - 1)
- MULTIXACT_MEMBER_SAFE_THRESHOLD: ~2.14 billion (2^31 - 1)
- MULTIXACT_MEMBER_DANGER_THRESHOLD: ~3.22 billion (3/4 * 2^32)
So the code path is count-driven.
Regarding docs:
For earlier versions (18 and before), the storage-size approximation
remains relevant because users don’t have direct access to member
count information. Since we don’t plan to backpatch (I assume so) this
new function, the documentation for older branches should continue to
rely on the storage-based approximation.
Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
docs can describe the thresholds in terms of counts directly. For
older branches, the storage approximation still provides users with a
practical way to reason about wraparound risk.
The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.
Thanks for suggesting this — it makes sense, especially for users
upgrading from earlier versions to 19 and higher. I’ve added a
members_bytes column to pg_get_multixact_stats(), computed as
num_members * 5. This respects the existing server-side logic while
also giving those users a familiar reference point, helping them
connect the older size-based guidance with the new count-based view.
---
References:
[0]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916
[1]: https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31
[2]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218
Patch v5 is attached. Thanks again for the thoughtful reviews — I really
appreciate the guidance and look forward to further feedback.
Best regards,
Naga
Attachments:
v5-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v5-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From 883781d4da9c133aa1c2408379276bb4f52bf3a8 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Mon, 18 Aug 2025 20:51:25 +0000
Subject: [PATCH v5] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_bytes : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch:
1. Renames ReadMultiXactCounts() to GetMultiXactInfo() and makes it public
- Provides a single accessor for MultiXact state
- Returns counts and horizons in one call
2. Adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 31 +++++
doc/src/sgml/maintenance.sgml | 54 +++++++-
src/backend/access/transam/multixact.c | 43 ++++--
src/backend/utils/adt/multixactfuncs.c | 52 +++++++
src/include/access/multixact.h | 1 +
src/include/catalog/pg_proc.dat | 15 +++
.../isolation/expected/multixact_stats.out | 94 +++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 127 ++++++++++++++++++
9 files changed, 399 insertions(+), 19 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..9dedc3715d7 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,37 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_bytes</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member entries created,
+ <literal>members_bytes</literal> is the storage occupied by <literal>num_members</literal>
+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+ <para>
+ This is a live snapshot of shared counters; the numbers can change between calls,
+ even within the same transaction.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..badd3392c4f 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 4 billion entries before reaching wraparound.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in
+ <xref linkend="functions-pg-snapshot"/>, provides a way to monitor
+ multixact allocation and usage patterns in real time. For example:
+ <programlisting>
+postgres=# SELECT num_mxids,num_members,oldest_multixact,oldest_offset FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | oldest_multixact | oldest_offset
+-----------+-------------+------------------+---------------
+ 99883849 | 773468747 | 39974368 | 351952978
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~100 million
+ multixact IDs and ~773 million member entries have been created since the oldest
+ surviving multixact (ID 39974368). By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in num_mxids might indicate
+ multiple sessions running UPDATE statements with foreign key checks,
+ concurrent SELECT FOR SHARE operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while num_members grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
</para>
<para>
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 886740d2d55..5fb7c12fdce 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2857,31 +2857,44 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
}
/*
- * Determine how many multixacts, and how many multixact members, currently
- * exist. Return false if unable to determine.
+ * GetMultiXactInfo
+ *
+ * Returns information about current MultiXact state:
+ * - multixacts: Number of MultiXacts (nextMultiXactId - oldestMultiXactId)
+ * - members: Number of member entries (nextOffset - oldestOffset)
+ * - oldestMultiXactId: Oldest MultiXact ID still in use
+ * - oldestOffset: Oldest offset still in use
+ *
+ * Returns false if the oldest offset is not known, in which case all output
+ * parameters are set to 0/invalid values for consistency.
*/
-static bool
-ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members)
+bool
+GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
+ MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset)
{
- MultiXactOffset nextOffset;
- MultiXactOffset oldestOffset;
- MultiXactId oldestMultiXactId;
- MultiXactId nextMultiXactId;
- bool oldestOffsetKnown;
+ MultiXactOffset nextOffset;
+ MultiXactId nextMultiXactId;
+ bool oldestOffsetKnown;
LWLockAcquire(MultiXactGenLock, LW_SHARED);
nextOffset = MultiXactState->nextOffset;
- oldestMultiXactId = MultiXactState->oldestMultiXactId;
+ *oldestMultiXactId = MultiXactState->oldestMultiXactId;
nextMultiXactId = MultiXactState->nextMXact;
- oldestOffset = MultiXactState->oldestOffset;
+ *oldestOffset = MultiXactState->oldestOffset;
oldestOffsetKnown = MultiXactState->oldestOffsetKnown;
LWLockRelease(MultiXactGenLock);
if (!oldestOffsetKnown)
+ {
+ *members = 0;
+ *multixacts = 0;
+ *oldestMultiXactId = InvalidMultiXactId;
+ *oldestOffset = 0;
return false;
+ }
- *members = nextOffset - oldestOffset;
- *multixacts = nextMultiXactId - oldestMultiXactId;
+ *members = nextOffset - *oldestOffset;
+ *multixacts = nextMultiXactId - *oldestMultiXactId;
return true;
}
@@ -2920,9 +2933,11 @@ MultiXactMemberFreezeThreshold(void)
uint32 victim_multixacts;
double fraction;
int result;
+ MultiXactId oldestMultiXactId;
+ MultiXactOffset oldestOffset;
/* If we can't determine member space utilization, assume the worst. */
- if (!ReadMultiXactCounts(&multixacts, &members))
+ if (!GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
return 0;
/* If member space utilization is low, no special action is required. */
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..ba9e8313ab4 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,55 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use
+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5] = {true, true, true, true, true};
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = Int32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = Int64GetDatum(oldestOffset);
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
+ }
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 6607b645a18..19de74950cb 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -159,5 +159,6 @@ extern const char *multixact_identify(uint8 info);
extern char *mxid_to_string(MultiXactId multi, int nmembers,
MultiXactMember *members);
extern char *mxstatus_to_string(MultiXactStatus status);
+extern bool GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset);
#endif /* MULTIXACT_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..985364e0cd6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest values',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_bytes,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..2893c4d9f36
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,94 @@
+Parsed test spec with 3 sessions
+
+starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+step d_begin: BEGIN; SET client_min_messages = warning;
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step d_commit: COMMIT;
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 9f1e997d81b..4d94fc94e77 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -118,3 +118,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..9098b6f5c5d
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Driver session: keep a transaction open while we take snapshots and check.
+session "driver"
+step d_begin { BEGIN; SET client_min_messages = warning; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row (still in driver xact).
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact).
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+step d_commit { COMMIT; }
+
+permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
--
2.47.3
On Mon, Aug 18, 2025 at 08:32:39PM -0500, Naga Appani wrote:
Thanks a lot for taking the time to review this patch and share your thoughts.
Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.Please see my in-line replies below.
FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.
So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--
Michael
On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote:
FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--
Thanks for the review and separating the refactoring into its own commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.
Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.
Changes since v5:
- Removed the refactoring, as GetMultiXactInfo() is already committed.
- Documentation revised to describe thresholds in terms of raw counts.
Hopefully this makes the proposal easier to evaluate on its own merits.
Attachments:
v6-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v6-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From 67dc7b387f950364234e95fe6f7099a85a445834 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Mon, 18 Aug 2025 20:51:25 +0000
Subject: [PATCH v6] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_bytes : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 31 +++++
doc/src/sgml/maintenance.sgml | 54 +++++++-
src/backend/utils/adt/multixactfuncs.c | 52 +++++++
src/include/catalog/pg_proc.dat | 15 +++
.../isolation/expected/multixact_stats.out | 94 +++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 127 ++++++++++++++++++
7 files changed, 369 insertions(+), 5 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..9dedc3715d7 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,37 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_bytes</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member entries created,
+ <literal>members_bytes</literal> is the storage occupied by <literal>num_members</literal>
+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+ <para>
+ This is a live snapshot of shared counters; the numbers can change between calls,
+ even within the same transaction.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries before reaching wraparound.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in
+ <xref linkend="functions-pg-snapshot"/>, provides a way to monitor
+ multixact allocation and usage patterns in real time. For example:
+ <programlisting>
+postgres=# SELECT num_mxids,num_members,oldest_multixact,oldest_offset FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | oldest_multixact | oldest_offset
+-----------+-------------+------------------+---------------
+ 99883849 | 773468747 | 39974368 | 351952978
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~100 million
+ multixact IDs and ~773 million member entries have been created since the oldest
+ surviving multixact (ID 39974368). By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in num_mxids might indicate
+ multiple sessions running UPDATE statements with foreign key checks,
+ concurrent SELECT FOR SHARE operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while num_members grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
</para>
<para>
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..ba9e8313ab4 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,55 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use
+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5] = {true, true, true, true, true};
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+ HeapTuple tuple;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = Int32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = Int64GetDatum(oldestOffset);
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
+ }
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..985364e0cd6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest values',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_bytes,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..2893c4d9f36
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,94 @@
+Parsed test spec with 3 sessions
+
+starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+step d_begin: BEGIN; SET client_min_messages = warning;
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step d_commit: COMMIT;
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 9f1e997d81b..4d94fc94e77 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -118,3 +118,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..9098b6f5c5d
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Driver session: keep a transaction open while we take snapshots and check.
+session "driver"
+step d_begin { BEGIN; SET client_min_messages = warning; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row (still in driver xact).
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact).
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+step d_commit { COMMIT; }
+
+permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
--
2.47.3
On 2025-08-20 13:27, Naga Appani wrote:
Thanks for working on this!
On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
wrote:FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--Thanks for the review and separating the refactoring into its own
commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.
diff --git a/doc/src/sgml/maintenance.sgml
b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that
database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB,
aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
number
+ of members created exceeds approximately 2^31 entries, aggressive
vacuum
scans will occur more often for all tables, starting with those
that
Looking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:
The threshold is two billion members, which was interpreted as 2GB
in the documentation. Fix to reflect that each member takes up five
bytes, which translates to about 10GB. This is not exact, because of
page boundaries. While at it, mention the maximum size 20GB.
Anyway, I also think, as Ashutosh suggested, that if we want to fix this
documentation, it would be better to do so in a separate patch.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On 2025-08-22 09:28, torikoshia wrote:
On 2025-08-20 13:27, Naga Appani wrote:
Thanks for working on this!
On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
wrote:FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful,
but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--Thanks for the review and separating the refactoring into its own
commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index e7a9f58c015..6f0e8d7c10a 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database. <para> As a safety device, an aggressive vacuum scan will occur for any table whose multixact-age is greater than <xref - linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the - storage occupied by multixacts members exceeds about 10GB, aggressive vacuum + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number + of members created exceeds approximately 2^31 entries, aggressive vacuum scans will occur more often for all tables, starting with those thatLooking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:The threshold is two billion members, which was interpreted as 2GB
in the documentation. Fix to reflect that each member takes up five
bytes, which translates to about 10GB. This is not exact, because of
page boundaries. While at it, mention the maximum size 20GB.Anyway, I also think, as Ashutosh suggested, that if we want to fix
this documentation, it would be better to do so in a separate patch.
Ah, I've found why you choose to add this doc modification in this patch
in the thread, sorry for skipping over the part:
| For earlier versions (18 and before), the storage-size approximation
| remains relevant because users don’t have direct access to member
| count information. Since we don’t plan to backpatch (I assume so) this
| new function, the documentation for older branches should continue to
| rely on the storage-based approximation.
| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.
Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.
Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.
Here's detailed review of the patch
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member
entries created,
+ <literal>members_bytes</literal> is the storage occupied by
<literal>num_members</literal>
I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.
+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.
+ <para>
+ This is a live snapshot of shared counters; the numbers can change
between calls,
+ even within the same transaction.
+ </para></entry>
I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum
a member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?
+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in
unnecessary pair of commas.
+ This output shows a system with significant multixact activity:
about ~100 million
+ multixact IDs and ~773 million member entries have been created
since the oldest
+ surviving multixact (ID 39974368). By leveraging this information,
the function helps:
+ <orderedlist>
+ <listitem>
... snip ...
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
But others may have different opinions.
Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.
+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use
We don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.
+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.
tuple or record instead of row.
In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().
I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/
And also
/* Have we completed multixact startup? */
bool finishedStartup;
I think we need to define this condition more accurately.
And include it in the documentation as well.
+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
memset(nulls, false, sizeof(nulls)); is better and used everywhere.
In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.
+++ b/src/test/isolation/specs/multixact_stats.spec
I have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While
it is pinned
+# by two open transactions, we assert only invariants that background
VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation
may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
What's a driver transaction?
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns,
which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
... snip ...
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.
--
Best Wishes,
Ashutosh Bapat
Hi Atsushi and Ashutosh,
Thank you for reviewing the patch. Attached is v7, incorporating the
feedback. Please see my responses in-line below.
On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.
Updated docs to include both counts and approximate storage.
I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.
Good point! Adjusted to a units-free name: members_size.
+ in <literal>pg_multixact/members</literal> directory, + <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and + <literal>oldest_offset</literal> is the oldest member offset still in use.I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?
IMHO, exposing oldest_offset gives a full picture of multixact state.
It complements oldest_multixact,
and including it won’t hurt. That said, if consensus is against it,
I’m happy to drop it.
+ These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> + for further details on multixact wraparound.I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.
Removed.
+ <para> + This is a live snapshot of shared counters; the numbers can change between calls, + even within the same transaction. + </para></entry>I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?
Updated wording.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number + of members created exceeds approximately 2^31 entries, aggressive vacuuma member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?
Correct, updated.
+ <para> + The <function>pg_get_multixact_stats()</function> function, described inunnecessary pair of commas.
Fixed.
+ This output shows a system with significant multixact activity: about ~100 million + multixact IDs and ~773 million member entries have been created since the oldest + surviving multixact (ID 39974368). By leveraging this information, the function helps: + <orderedlist> + <listitem> ... snip ... + Detect potential performance impacts before they become critical. + For instance, high multixact usage from frequent row-level locking or + foreign key operations can lead to increased I/O and CPU overhead during + vacuum operations. Monitoring these stats helps tune autovacuum frequency + and transaction patterns. + </simpara> + </listitem> + </orderedlist>I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.+ These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/> + for further details on multixact wraparound.But others may have different opinions.
Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.
Ack. I believe keeping the example with a short list is helpful for
users to navigate
and interpret the stats. If preferred, I can trim it to a brief
paragraph with just the query in
the next rev.
+ * Returns statistics about current MultiXact usage: + * - num_mxids: Number of MultiXact IDs in use + * - num_members: Total number of member entries + * - oldest_multixact: Oldest MultiXact ID still needed + * - oldest_offset: Oldest offset still in useWe don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.
Updated - kept only the high-level description.
+ * + * Returns a row of NULLs if the MultiXact system is not yet initialized.tuple or record instead of row.
In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/
Switched to PG_RETURN_NULL() and rephrased both code comment and docs.
+ * Calculate approximate storage space: + * - Members are stored in groups of 4 + * - Each group takes 20 bytes (5 bytes per member) + * Note: This ignores small page overhead (12 bytes per 8KB) + */ + membersBytes = (int64) members * 5;Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.
Those macros are already defined in multixact.c - for example,
MULTIXACT_MEMBERS_PER_MEMBERGROUP
and MULTIXACT_MEMBERGROUP_SIZE encode the 4-per-group and 20-byte
layout. They are local today,
and I’m not sure why they were never exposed. Rather than moving them
into a header and creating wider changes,
v7 retains the explicit 5-bytes/member estimate with an explanatory
comment to stay consistent with existing guidance.
If we feel these macros should be promoted to a header, I think that
would be best handled as a small, separate patch,
and I’d be happy to help with that.
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
memset(nulls, false, sizeof(nulls)); is better and used everywhere.
In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.
Updated. v7 uses memset(false) and only sets true where needed.
+++ b/src/test/isolation/specs/multixact_stats.specI have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.@@ -0,0 +1,127 @@ +# High-signal invariants for pg_get_multixact_stats() +# We create exactly one fresh MultiXact on a brand-new table. While it is pinned +# by two open transactions, we assert only invariants that background VACUUM/FREEZE +# cannot violate: +# • members increased by ≥ 1 when the second locker arrived, +# • num_mxids / num_members did not decrease vs earlier snapshots, +# • oldest_* never decreases. +# We make NO assertions after releasing locks (freezing/truncation may shrink deltas). +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver +# transaction to narrow the window for unrelated truncation between snapshots.What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.
+# +# Terminology (global counters): +# num_mxids, num_members : “in-use” deltas derived from global horizons +# oldest_multixact, offset : oldest horizons; they move forward, never backward +# +# All assertions execute while our multixact is pinned by open txns, which protects +# the truncation horizon (VACUUM can’t advance past our pinned multi).Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?
You are correct — the assertions are executed at the end, after the commits.
The key point is that all snapshots are taken while the multixact is
pinned by open transactions,
so the invariants hold despite the final check happening later.
+step snap0 { + CREATE TEMP TABLE snap0 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); +}You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?
I kept separate temp tables to keep each snapshot isolated and easy to
read in the spec.
A single table with a PK would work too, but I felt temp tables made
the sequence clearer.
+ +# Pretty, deterministic key/value output of boolean checks. +# Keys: ... snip ... + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)), + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)), + (s1.num_members >= COALESCE(s0.num_members, 0)), + (s2.num_members >= COALESCE(s1.num_members, 0)) + ]This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.
I used the labeled key/value array to mimic \x-style readability while
keeping the output
deterministic for isolation’s text diffs. It clearly names each
invariant and avoids
formatter-dependent width/spacing.
Thanks again for the thoughtful reviews. I really appreciate the
guidance and will be glad to adjust
further if needed.
Best regards,
Naga
Attachments:
v7-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v7-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From 79dc6ffd50ad0a2dc84eeb2da3353d823b0ced91 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Wed, 3 Sep 2025 04:09:56 +0000
Subject: [PATCH v7] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 34 +++++
doc/src/sgml/maintenance.sgml | 57 +++++++-
src/backend/utils/adt/multixactfuncs.c | 47 +++++++
src/include/catalog/pg_proc.dat | 15 +++
.../isolation/expected/multixact_stats.out | 94 +++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 127 ++++++++++++++++++
7 files changed, 369 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..fcc5e4bde38 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,40 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member entries created,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further details on multixact wraparound.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para>
+ <para>
+ The function returns <literal>NULL</literal> when multixact statistics are unavailable.
+ For example, during startup before multixact initialization completes or when
+ the oldest member offset cannot be determined.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..8b265e2348f 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,59 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries before reaching wraparound.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time. For example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | oldest_offset | members_size_pretty
+-----------+-------------+--------------+------------------+---------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 3 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..c393f11a38c 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,50 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current MultiXact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown, which happens during
+ * system startup or when no MultiXact references exist in any relation.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate storage space for members. Members are stored in groups of 4,
+ * with each group taking 20 bytes, resulting in 5 bytes per member.
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = Int32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = Int64GetDatum(oldestOffset);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+ }
+
+ PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..f5f224b8a14 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and oldest values',
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..2893c4d9f36
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,94 @@
+Parsed test spec with 3 sessions
+
+starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+step d_begin: BEGIN; SET client_min_messages = warning;
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step d_commit: COMMIT;
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 9f1e997d81b..4d94fc94e77 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -118,3 +118,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..9098b6f5c5d
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Driver session: keep a transaction open while we take snapshots and check.
+session "driver"
+step d_begin { BEGIN; SET client_min_messages = warning; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row (still in driver xact).
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact).
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+step d_commit { COMMIT; }
+
+permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
--
2.47.3
On Thu, Sep 4, 2025 at 2:41 AM Naga Appani <nagnrik@gmail.com> wrote:
On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
Updated docs to include both counts and approximate storage.
This one is remaining.
+ up to approximately 2^32 entries before reaching wraparound.
... 2^32 entries (occupying roughly 20GB in the
<literal>pg_multixact/members</literal> directory) before reaching
wraparound. ...
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.
I don't think we need this reference here. Reference back from that
section is enough.
+ * Returns NULL if the oldest referenced offset is unknown, which
happens during
+ * system startup or when no MultiXact references exist in any relation.
If no MultiXact references exist, and GetMultiXactInfo() returns
false, MultiXactMemberFreezeThreshold() will assume the worst, which I
take as meaning that it will trigger aggressive autovacuum. No
MultiXact references existing is a common case which shouldn't be
assumed as the worst case. The comment I quoted means "the oldest
value of the offset referenced by any multi-xact referenced by a
relation *may not be always known". You seem to have interpreted "may
not be known" as "does not exist" That's not right. I would write this
as "Returns NULL if the oldest referenced offset is unknown which
happens during system startup".
Similarly I would rephrase the following docs as
+ <para>
+ The function returns <literal>NULL</literal> when multixact
statistics are unavailable.
+ For example, during startup before multixact initialization completes or when
+ the oldest member offset cannot be determined.
"The function returns <literal>NULL</literal> when multixact
statistics when the oldest multixact offset corresponding to a
multixact referenced by a relation is not known after starting the
system."
@@ -0,0 +1,127 @@ +# High-signal invariants for pg_get_multixact_stats()
What does "High-signal" mean here? Is that term defined somewhere?
Using terms that most of the contributors are familiar with improves
readability. If a new term is required, it needs to be defined first.
But I doubt something here requires defining a new term.
What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.
I still don't understand the purpose of this transaction.
pg_get_multixact_stats() isn't transactional so the driver transaction
isn't holding any "snapshot" of the stats. It's also not creating any
multixact and hence does not contribute to testing the output of
pg_get_multixact_stats. Whatever this session is doing, can be done
outside a transaction too. Which step in this session requires an
outer transaction?
Some more comments
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?
+ <literal>num_members</literal> is the number of multixact member
entries created,
Similarly this.
+ multixact allocation and usage patterns in real time. For example:
suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?
+ <literal>num_members</literal> is the number of multixact member
entries created,
Similarly this.
+ multixact allocation and usage patterns in real time. For example:
suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.
+
+ values[0] = Int32GetDatum(multixacts);
This should be UInt32GetDatum() multixacts is uint32.
+ values[1] = Int64GetDatum(members);
Similarly this since MultiXactOffset is uint32.
+ values[4] = Int64GetDatum(oldestOffset);
Similarly this since MultiXactOffset is uint32.
+# Get MultiXact state
+{
+ oid => '9001',
+ descr => 'get current multixact member and multixact ID counts and
oldest values',
suggestion: get current multixact usage statistics.
+ proname => 'pg_get_multixact_stats',
+ prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames =>
'{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ provolatile => 'v',
+ proparallel => 's',
+ prosrc => 'pg_get_multixact_stats'
+},
I like the way you have formatted the new entry, but other entries in
this file are not formatted this way. It would be good to format it
like other entries but if other reviewers prefer this way, we can go
with this too.
--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,
Thank you for continuing to review the patch. Attached is v8,
incorporating the feedback. Please see my responses inline below.
On Fri, Sep 5, 2025 at 6:27 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
This one is remaining.
+ up to approximately 2^32 entries before reaching wraparound.... 2^32 entries (occupying roughly 20GB in the
<literal>pg_multixact/members</literal> directory) before reaching
wraparound. ...
Done.
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.I don't think we need this reference here. Reference back from that
section is enough.
I kept the cross-reference for now since other multixact function docs
(such as pg_get_multixact_members()) already use this style, and it helps
readers who land directly on the function reference page. Please let me
know if you would prefer that I remove it.
+ * Returns NULL if the oldest referenced offset is unknown, which happens during + * system startup or when no MultiXact references exist in any relation.If no MultiXact references exist, and GetMultiXactInfo() returns
false, MultiXactMemberFreezeThreshold() will assume the worst, which I
take as meaning that it will trigger aggressive autovacuum. No
MultiXact references existing is a common case which shouldn't be
assumed as the worst case. The comment I quoted means "the oldest
value of the offset referenced by any multi-xact referenced by a
relation *may not be always known". You seem to have interpreted "may
not be known" as "does not exist" That's not right. I would write this
as "Returns NULL if the oldest referenced offset is unknown which
happens during system startup".Similarly I would rephrase the following docs as + <para> + The function returns <literal>NULL</literal> when multixact statistics are unavailable. + For example, during startup before multixact initialization completes or when + the oldest member offset cannot be determined."The function returns <literal>NULL</literal> when multixact
statistics when the oldest multixact offset corresponding to a
multixact referenced by a relation is not known after starting the
system."
Updated.
@@ -0,0 +1,127 @@ +# High-signal invariants for pg_get_multixact_stats()What does "High-signal" mean here? Is that term defined somewhere?
Using terms that most of the contributors are familiar with improves
readability. If a new term is required, it needs to be defined first.
But I doubt something here requires defining a new term.
Dropped that wording and simplified the isolation test.
What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.I still don't understand the purpose of this transaction.
pg_get_multixact_stats() isn't transactional so the driver transaction
isn't holding any "snapshot" of the stats. It's also not creating any
multixact and hence does not contribute to testing the output of
pg_get_multixact_stats. Whatever this session is doing, can be done
outside a transaction too. Which step in this session requires an
outer transaction?
Removed this mention; the test now only checks monotonicity without extra
transaction scaffolding.
Some more comments + Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the number of multixact IDs assigned,Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?+ <literal>num_members</literal> is the number of multixact member
entries created,
Updated.
+ Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the number of multixact IDs assigned,Is this the number of multixact IDs assigned till now (since whatever
time) or the number of multixact IDs currently in the system?+ <literal>num_members</literal> is the number of multixact member
entries created,
Updated.
+ multixact allocation and usage patterns in real time. For example:
suggestion: ... real time, for example: ... Otherwise the sentence
started by "For example" is not a complete sentence.
Updated.
+ values[0] = Int32GetDatum(multixacts);
This should be UInt32GetDatum() multixacts is uint32.
+ values[1] = Int64GetDatum(members);
Similarly this since MultiXactOffset is uint32.
+ values[4] = Int64GetDatum(oldestOffset);
Similarly this since MultiXactOffset is uint32.
Thanks for pointing this out. I had originally followed the existing
types but drifted, fixed now.
+# Get MultiXact state +{ + oid => '9001', + descr => 'get current multixact member and multixact ID counts and oldest values',suggestion: get current multixact usage statistics.
Updated
+ proname => 'pg_get_multixact_stats', + prorettype => 'record', + proargtypes => '', + proallargtypes => '{int4,int8,int8,xid,int8}', + proargmodes => '{o,o,o,o,o}', + proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}', + provolatile => 'v', + proparallel => 's', + prosrc => 'pg_get_multixact_stats' +},I like the way you have formatted the new entry, but other entries in
this file are not formatted this way. It would be good to format it
like other entries but if other reviewers prefer this way, we can go
with this too.
I reformatted the pg_proc.dat entry to match the surrounding style.
Best regards,
Naga
Attachments:
v8-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v8-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From 4752fdd586f78da4ca68879f5a8dafdbfbb36445 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Thu, 11 Sep 2025 22:18:51 +0000
Subject: [PATCH v8] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 35 ++++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 47 +++++++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 92 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 119 ++++++++++++++++++
7 files changed, 356 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..2d21d0d3af5 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,41 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs assigned since startup,
+ <literal>num_members</literal> is the total number of multixact member entries created since startup,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further details on multixact wraparound.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para>
+ <para>
+ Returns <literal>NULL</literal> when multixact statistics are unavailable,
+ such as during startup before multixact initialization completes.
+ Specifically, this occurs when the oldest multixact offset
+ corresponding to a multixact referenced by a relation is not known.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..58be621182b 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries(occupying roughly 20GB in the
+ <literal>pg_multixact/members</literal> directory) before reaching wraparound.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | oldest_offset | members_size_pretty
+-----------+-------------+--------------+------------------+---------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 3 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..3117acb19fa 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,50 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown, which happens during
+ * system startup.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate storage space for members. Members are stored in groups of 4,
+ * with each group taking 20 bytes, resulting in 5 bytes per member.
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = UInt32GetDatum(oldestOffset);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+ }
+
+ PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 03e82d28c87..e3bdb187da0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12588,4 +12588,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int4,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..69845f058e4
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,92 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 5afae33d370..bab8a8eaf31 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -120,3 +120,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..af6b091248a
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,119 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.47.3
On Fri, Sep 12, 2025 at 9:03 AM Naga Appani <nagnrik@gmail.com> wrote:
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs
assigned since startup,
+ <literal>num_members</literal> is the total number of multixact
member entries created since startup,
GetMultiXactInfo() returns following
*members = nextOffset - *oldestOffset;
*multixacts = nextMultiXactId - *oldestMultiXactId;
They seem to be the numbers that exist in the system at the time of
the call and not since the startup. Am I missing something?
+ up to approximately 2^32 entries(occupying roughly 20GB in the
space between s and (
+ proallargtypes => '{int4,int8,int8,xid,int8}',
I think the first parameter should also be int8 since uint32 won't fit
into int4.
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.I don't think we need this reference here. Reference back from that
section is enough.I kept the cross-reference for now since other multixact function docs
(such as pg_get_multixact_members()) already use this style, and it helps
readers who land directly on the function reference page. Please let me
know if you would prefer that I remove it.
None of the write up there talks about multixact wraparound so the
reference seems arbitrary to me. I would remove it.
+ * Returns NULL if the oldest referenced offset is unknown, which happens during + * system startup or when no MultiXact references exist in any relation.If no MultiXact references exist, and GetMultiXactInfo() returns
false, MultiXactMemberFreezeThreshold() will assume the worst, which I
take as meaning that it will trigger aggressive autovacuum. No
MultiXact references existing is a common case which shouldn't be
assumed as the worst case. The comment I quoted means "the oldest
value of the offset referenced by any multi-xact referenced by a
relation *may not be always known". You seem to have interpreted "may
not be known" as "does not exist" That's not right. I would write this
as "Returns NULL if the oldest referenced offset is unknown which
happens during system startup".Similarly I would rephrase the following docs as + <para> + The function returns <literal>NULL</literal> when multixact statistics are unavailable. + For example, during startup before multixact initialization completes or when + the oldest member offset cannot be determined."The function returns <literal>NULL</literal> when multixact
statistics when the oldest multixact offset corresponding to a
multixact referenced by a relation is not known after starting the
system."Updated.
Thanks for updating the documentation. But the comment in prologue of
pg_get_multixact_stats is not completely correct as mentioned in my
previous reply. I would just say "Returns NULL if the oldest
referenced offset is unknown". Anybody who wants to know when can that
happen, may search relevant code by looking at GetMultiXactInfo().
I still find the comment at the start of the isolation test a bit
verbose. But I think it's best to leave that to a committer's
judgement. However, the word "locker" is unusual. You want to say the
session that locks a row (or something similar). We may leave exact
words to a committer's judgement.
I still find think that the specific usage scenarios described in the
documentation are not required. And the computation in
pg_get_multixact_stats() should use macros instead of bare numbers.
But we can leave that for a committer to decide. Once you address
above comments, we may mark the CF entry as RFC.
--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,
Thank you for continuing to review the patch. Attached is v9,
incorporating the feedback. Please see my responses inline below.
On Fri, Sep 12, 2025 at 5:34 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
+ Returns statistics about current multixact usage: + <literal>num_mxids</literal> is the total number of multixact IDs assigned since startup, + <literal>num_members</literal> is the total number of multixact member entries created since startup,GetMultiXactInfo() returns following
*members = nextOffset - *oldestOffset;
*multixacts = nextMultiXactId - *oldestMultiXactId;
They seem to be the numbers that exist in the system at the time of
the call and not since the startup. Am I missing something?
You are right, these counts reflect the numbers currently present in
the system, not cumulative totals since startup. I have reworded the
docs to say “currently present”.
+ up to approximately 2^32 entries(occupying roughly 20GB in the
space between s and (
Fixed.
+ proallargtypes => '{int4,int8,int8,xid,int8}',
I think the first parameter should also be int8 since uint32 won't fit
into int4.
Updated.
+ See <xref linkend="vacuum-for-multixact-wraparound"/> for further
details on multixact wraparound.I don't think we need this reference here. Reference back from that
section is enough.I kept the cross-reference for now since other multixact function docs
(such as pg_get_multixact_members()) already use this style, and it helps
readers who land directly on the function reference page. Please let me
know if you would prefer that I remove it.None of the write up there talks about multixact wraparound so the
reference seems arbitrary to me. I would remove it.
Removed.
Thanks for updating the documentation. But the comment in prologue of
pg_get_multixact_stats is not completely correct as mentioned in my
previous reply. I would just say "Returns NULL if the oldest
referenced offset is unknown". Anybody who wants to know when can that
happen, may search relevant code by looking at GetMultiXactInfo().
Simplified the prologue comment as suggested.
I still find the comment at the start of the isolation test a bit
verbose. But I think it's best to leave that to a committer's
judgement. However, the word "locker" is unusual. You want to say the
session that locks a row (or something similar). We may leave exact
words to a committer's judgement.
Reworded to remove "locker" and simplified.
I still find think that the specific usage scenarios described in the
documentation are not required. And the computation in
pg_get_multixact_stats() should use macros instead of bare numbers.
But we can leave that for a committer to decide. Once you address
above comments, we may mark the CF entry as RFC.
Sounds good!
With these updates in v9, I believe the patch is now in good shape to
be marked RFC. I’ll go ahead and update the CommitFest entry.
Thank you again for your thorough reviews and thoughtful guidance on
this patch — it has been very helpful.
Best regards,
Naga
Attachments:
v9-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v9-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload
From b6267ba1e64dadf79d2582ddc9ceca67c4c12857 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Mon, 15 Sep 2025 04:15:22 +0000
Subject: [PATCH v9] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 35 ++++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 46 +++++++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 92 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 119 ++++++++++++++++++
7 files changed, 355 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..623b590e338 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,41 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs currently present in the system,
+ <literal>num_members</literal> is the total number of multixact member entries currently
+ present in the system,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para>
+ <para>
+ Returns <literal>NULL</literal> when multixact statistics are unavailable,
+ such as during startup before multixact initialization completes.
+ Specifically, this occurs when the oldest multixact offset
+ corresponding to a multixact referenced by a relation is not known.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..cf56307090d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries (occupying roughly 20GB in the
+ <literal>pg_multixact/members</literal> directory) before reaching wraparound.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | oldest_offset | members_size_pretty
+-----------+-------------+--------------+------------------+---------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 3 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..9bf7a52a242 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,49 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate storage space for members. Members are stored in groups of 4,
+ * with each group taking 20 bytes, resulting in 5 bytes per member.
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = UInt32GetDatum(oldestOffset);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+ }
+
+ PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 03e82d28c87..3e9376c875b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12588,4 +12588,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..69845f058e4
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,92 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 5afae33d370..bab8a8eaf31 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -120,3 +120,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..7ef05f1bff0
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,119 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second session locked the row,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock on the same tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.47.3
On 2025-09-15 14:47, Naga Appani wrote:
With these updates in v9, I believe the patch is now in good shape to
be marked RFC. I’ll go ahead and update the CommitFest entry.
As shown in the commitfest app, v9 patch fails to build:
multixactfuncs.c:129:28: error: call to undeclared function
'heap_form_tuple'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
129 | return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
| ^
multixactfuncs.c:129:28: note: did you mean 'brin_form_tuple'?
../../../../src/include/access/brin_tuple.h:96:19: note:
'brin_form_tuple' declared here
96 | extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber
blkno,
| ^
multixactfuncs.c:129:28: error: incompatible integer to pointer
conversion passing 'int' to parameter of type 'const HeapTupleData *'
(aka 'const struct HeapTupleData *') [-Wint-conversion]
129 | return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../../src/include/funcapi.h:230:40: note: passing argument to
parameter 'tuple' here
230 | HeapTupleGetDatum(const HeapTupleData *tuple)
| ^
2 errors generated.
Could you please update the patch to fix this?
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Hi,
On Fri, Oct 17, 2025 at 8:28 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
On 2025-09-15 14:47, Naga Appani wrote:
With these updates in v9, I believe the patch is now in good shape to
be marked RFC. I’ll go ahead and update the CommitFest entry.As shown in the commitfest app, v9 patch fails to build:
multixactfuncs.c:129:28: error: call to undeclared function
'heap_form_tuple'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
129 | return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
| ^
multixactfuncs.c:129:28: note: did you mean 'brin_form_tuple'?
../../../../src/include/access/brin_tuple.h:96:19: note:
'brin_form_tuple' declared here
96 | extern BrinTuple *brin_form_tuple(BrinDesc *brdesc, BlockNumber
blkno,
| ^
multixactfuncs.c:129:28: error: incompatible integer to pointer
conversion passing 'int' to parameter of type 'const HeapTupleData *'
(aka 'const struct HeapTupleData *') [-Wint-conversion]
129 | return
HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../../src/include/funcapi.h:230:40: note: passing argument to
parameter 'tuple' here
230 | HeapTupleGetDatum(const HeapTupleData *tuple)
| ^
2 errors generated.Could you please update the patch to fix this?
Here’s the updated v10 patch, now including access/htup_details.h in
src/backend/utils/adt/multixactfuncs.c. I’m also interested in this
patch and plan to review it.
Best,
Xuneng
Attachments:
v10-0001-Add-pg_get_multixact_stats-function-for-monitori.patchapplication/octet-stream; name=v10-0001-Add-pg_get_multixact_stats-function-for-monitori.patchDownload
From 208ffde2dfebfdeed59890bbb11782bbd7daf8c2 Mon Sep 17 00:00:00 2001
From: alterego655 <824662526@qq.com>
Date: Fri, 17 Oct 2025 10:02:00 +0800
Subject: [PATCH v10] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
- oldest_offset : oldest member offset still in use
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 35 ++++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 47 +++++++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 92 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 119 ++++++++++++++++++
7 files changed, 356 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..623b590e338 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,41 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type>,
+ <parameter>oldest_offset</parameter> <type>bigint</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs currently present in the system,
+ <literal>num_members</literal> is the total number of multixact member entries currently
+ present in the system,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para>
+ <para>
+ Returns <literal>NULL</literal> when multixact statistics are unavailable,
+ such as during startup before multixact initialization completes.
+ Specifically, this occurs when the oldest multixact offset
+ corresponding to a multixact referenced by a relation is not known.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..cf56307090d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries (occupying roughly 20GB in the
+ <literal>pg_multixact/members</literal> directory) before reaching wraparound.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | oldest_offset | members_size_pretty
+-----------+-------------+--------------+------------------+---------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 3 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..782045aa2c8 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -14,6 +14,7 @@
#include "postgres.h"
+#include "access/htup_details.h"
#include "access/multixact.h"
#include "funcapi.h"
#include "utils/builtins.h"
@@ -85,3 +86,49 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[5];
+ bool nulls[5];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate storage space for members. Members are stored in groups of 4,
+ * with each group taking 20 bytes, resulting in 5 bytes per member.
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ values[4] = UInt32GetDatum(oldestOffset);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+ }
+
+ PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b51d2b17379..fd569f9c760 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12601,4 +12601,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid,int8}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..69845f058e4
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,92 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |t
+is_members_nondec_01 |t
+is_members_nondec_12 |t
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 5afae33d370..bab8a8eaf31 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -120,3 +120,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..7ef05f1bff0
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,119 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second session locked the row,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock on the same tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_init_oldest_off : oldest_offset is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1)
+# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+ (s2.oldest_offset IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+ (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)),
+ (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.51.0
Thanks for working on this. I'm wondering if this is expected / could
help with monitoring for "space exhaustion" issues, which we currently
can't do easily, as it's not exposed anywhere.
This is in multixact.c at line ~1177, where we do this:
if (MultiXactState->oldestOffsetKnown &&
MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
nextOffset, nmembers))
{
ereport(ERROR, ...
}
But I'm not sure the current patch exposes enough information to
calculate how much space remains - calculating that we requires
offsetStopLimit and nextOffset.
The stopLimit could be calculated from oldest_offset, which the patch
returns. It's not quite trivial. It depends on BLCKSZ through
MULTIXACT_MEMBERS_PER_PAGE, and various other internal constants. It's
tempting to hardcode those into monitoring scripts, which then gets
broken in subtle ways with custom builds or if we change something
(which for multixacts we can).
And I don't think the patch exposes nextOffset, right? So AFAICS we
can't actually calculate the remaining space.
Could it either return nextOffset, or maybe actually calculate and
return the remaining space? And perhaps the "total" space, so that it's
possible to calculate what fraction of the space we already consumed.
I'm actually not entirely convinced we should be exposing the raw
internal information this patch aims to expose. Because a lot of that
feels like an internal implementation detail, and it's going to be hard
to interpret ....
Knowing num_mxids / num_members or members_size is nice, but how would
I judge how far the system is from hitting some threshold or hard limit?
Is there some maximum number of mxids/members that we could return? Or
something like that?
Similarly for oldest_multixact / oldest_offset. How useful is that
without knowing the "next" value for each of those?
Or am I missing something obvious?
regards
--
Tomas Vondra
Thank you for the feedback, Tomas! I agree with the goal you outlined,
providing a
user-friendly “how much space is left” signal would make monitoring far more
actionable.
On Sat, Oct 18, 2025 at 6:18 AM Tomas Vondra <tomas@vondra.me> wrote:
Knowing num_mxids / num_members or members_size is nice, but how would
I judge how far the system is from hitting some threshold or hard limit?
Is there some maximum number of mxids/members that we could return? Or
something like that?
Based on this, I experimented with calculating a num_remaining_members value to
estimate how close the system is to MultiXact member-space exhaustion. I tested
two approaches and validated their behavior through repeated exhaustion cycles.
The results are below.
At the same time, both you and Ashutosh pointed out that oldest_offset exposes
internal implementation details and is not particularly useful on its own, so I
removed oldest_offset in v11.
WHAT I TRIED in regards to space remaining
==========================================
Approach 1: (offsetStopLimit - nextOffset)
------------------------------------------
I exposed offsetStopLimit from GetMultiXactInfo() and computed:
remainingMembers = offsetStopLimit - nextOffset;
Behavior at exhaustion:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
115409471 | 4294914940 | 1
(1 row)
After wraparound cleanup:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
0 | 0 | 0
(1 row)
The value stayed at 0 until roughly ~100k new members were allocated. My reading
is that nextOffset wraps to a small value, while offsetStopLimit remains large
(derived from the oldestOffset at the moment of truncation). Without using the
backend’s wrap-aware comparison logic (MultiXactOffsetWouldWrap()), plain
subtraction crosses the wrap boundary and becomes misleading.
Approach 2: (MaxMultiXactOffset - members)
------------------------------------------
I also tested:
remainingMembers = MaxMultiXactOffset - members;
Across three exhaustion cycles:
1st attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
125098473 | 4294914940 | 52355
(1 row)
2nd attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
116285530 | 4294905729 | 61566
(1 row)
3rd attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
111973488 | 4294862592 | 104703
(1 row)
The system correctly rejected inserts in each cycle, but the computed
“remaining”
value increased between cycles. This seems to match the dynamic nature of
offsetStopLimit, which appears to be recomputed after truncation:
- based on the new oldestOffset
- aligned back to the start of its segment
- with one safety segment subtracted
Because the stop boundary shifts depending on segment boundaries, the plain
(Max − members) formula reflects alignment effects rather than actual remaining
capacity.
Understanding
============
Based on reading the relevant parts of multixact.c and observing the runtime
behavior, both approaches seem to run into limitations when trying to derive a
“remaining members” value outside the backend. I may be missing details, but the
behavior I observed suggests that a reliable computation might require
duplicating
several internal mechanisms, including:
- wrap-aware offset comparison
- SLRU page and segment alignment rules
- SetOffsetVacuumLimit’s segment recalculation
Without accounting for those, the derived numbers behaved inconsistently across
tests, sometimes staying at 0 until a large jump, and in other cases increasing
between exhaustion cycles. This seems broadly consistent with your concern that
simple arithmetic on these counters does not match how the backend determines
wraparound risk.
To be clear, this interpretation is based only on what I could infer from the
code and testing, and I may not be capturing the entire picture. But from what I
observed, a user-visible “remaining members” metric does not seem
straightforward
without exposing or replicating backend logic.
My thoughts
==========
Given all this, the cleanest approach appears to be not exposing a “remaining
members” counter directly.
PostgreSQL has historically avoided exposing remaining-capacity counters for
wraparound-limited resources such as:
- transaction IDs
- MultiXact IDs
- OIDs
Instead, PostgreSQL exposes current usage and relies on documented
thresholds for
monitoring. Following that established pattern avoids tying a SQL-visible
interface to backend internals that may evolve over time.
Self-monitoring based on documented limits
==========================================
Monitoring then follows the same pattern PostgreSQL already uses for XIDs and
other wraparound-limited values:
- track num_members growth over time
- warn when it exceeds roughly 2^31
- treat values approaching 2^32 as exhaustion-risk territory
- observe the growth rate to estimate when intervention may be needed
This keeps the interface simple, stable, and aligned with existing PostgreSQL
behavior.
Why oldest_offset was removed
=============================
Both you and Ashutosh pointed out that oldest_offset reflects internal SLRU
geometry and is not actionable without reproducing backend logic. Combined with
the behavior seen in the experiments above, it made sense not to expose this
field in the user-visible API. It is removed in v11.
Final shape of the function (v11)
=================================
The function now returns:
- num_mxids
- num_members
- members_size
- oldest_multixact
These fields are stable, directly interpretable, and do not depend on SLRU
internals or wrap-aware arithmetic.
On Thu, Oct 16, 2025 at 9:10 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Here’s the updated v10 patch, now including access/htup_details.h in
src/backend/utils/adt/multixactfuncs.c.
Thank you!
On Thu, Oct 16, 2025 at 7:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
Could you please update the patch to fix this?
Thank you for raising it and bringing it to attention!
Attached is the v11.
Best regards,
Naga
Attachments:
v11-0001-Add-pg_get_multixact_stats-function-for-monitori.patchapplication/octet-stream; name=v11-0001-Add-pg_get_multixact_stats-function-for-monitori.patchDownload
From 0b0c60f7abc7bb2913fb7e8b4a6286723a0caf74 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Thu, 23 Oct 2025 22:11:03 +0000
Subject: [PATCH v11] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 33 +++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 46 +++++++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 89 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 113 ++++++++++++++++++
7 files changed, 344 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..0d756186197 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,39 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs currently present in the system,
+ <literal>num_members</literal> is the total number of multixact member entries currently
+ present in the system,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para>
+ <para>
+ Returns <literal>NULL</literal> when multixact statistics are unavailable,
+ such as during startup before multixact initialization completes.
+ Specifically, this occurs when the oldest multixact offset
+ corresponding to a multixact referenced by a relation is not known.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index dc59c88319e..4d7e172450a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. The members can grow
+ up to approximately 2^32 entries (occupying roughly 20GB in the
+ <literal>pg_multixact/members</literal> directory) before reaching wraparound.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | members_size_pretty
+-----------+-------------+--------------+------------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..286676d3829 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -15,6 +15,7 @@
#include "postgres.h"
#include "access/multixact.h"
+#include "access/htup_details.h"
#include "funcapi.h"
#include "utils/builtins.h"
@@ -85,3 +86,48 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[4];
+ bool nulls[4];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+ {
+ /*
+ * Calculate storage space for members. Members are stored in groups of 4,
+ * with each group taking 20 bytes, resulting in 5 bytes per member.
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+ }
+
+ PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9121a382f76..928aa3cff72 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12604,4 +12604,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..27a6510c4ad
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,89 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |
+is_members_nondec_01 |
+is_members_nondec_12 |
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 5afae33d370..bab8a8eaf31 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -120,3 +120,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..7b034654504
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,113 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second session locked the row,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock on the same tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.47.3
On Wed, Nov 5, 2025 at 6:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Understanding
============
Based on reading the relevant parts of multixact.c and observing the runtime
behavior, both approaches seem to run into limitations when trying to derive a
“remaining members” value outside the backend. I may be missing details, but the
behavior I observed suggests that a reliable computation might require
duplicating
several internal mechanisms, including:
- wrap-aware offset comparison
- SLRU page and segment alignment rules
- SetOffsetVacuumLimit’s segment recalculationWithout accounting for those, the derived numbers behaved inconsistently across
tests, sometimes staying at 0 until a large jump, and in other cases increasing
between exhaustion cycles. This seems broadly consistent with your concern that
simple arithmetic on these counters does not match how the backend determines
wraparound risk.To be clear, this interpretation is based only on what I could infer from the
code and testing, and I may not be capturing the entire picture. But from what I
observed, a user-visible “remaining members” metric does not seem
straightforward
without exposing or replicating backend logic.
Right now MultiXactOffsetWouldWrap() assesses if the given distance is
higher than the permitted distance between start and boundary. I think
we could instead change it to report the permitted distance based on
start and boundary; use it to report remaining space (after
multiplying it with bytes per member) and also use it to assess
whether the required distance is within that boundary or whether we
need a warning. But ...
On Sat, Oct 18, 2025 at 4:48 PM Tomas Vondra <tomas@vondra.me> wrote:
Thanks for working on this. I'm wondering if this is expected / could
help with monitoring for "space exhaustion" issues, which we currently
can't do easily, as it's not exposed anywhere.This is in multixact.c at line ~1177, where we do this:
if (MultiXactState->oldestOffsetKnown &&
MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
nextOffset, nmembers))
{
ereport(ERROR, ...
}But I'm not sure the current patch exposes enough information to
calculate how much space remains - calculating that we requires
offsetStopLimit and nextOffset.
The function exposes the number of existing members and the amount of
space they consume (members_size). The documentation mentions space
related thresholds 10GB and 20GB. Isn't comparing members_size to
these thresholds enough to take appropriate action? If so, we could
report the difference between these respective thresholds and
members_size as a metric of space remaining before a given threshold
is triggered.
--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,
Thanks for the review!
I agree - comparing the exposed members_size against the documented
thresholds is sufficient for monitoring purposes.
This aligns with the approach taken in v11: exposing the current usage in
a way consistent with other PostgreSQL counters (e.g., XIDs, OIDs), without
introducing user-visible remaining-capacity calculations whose behavior is
inconsistent and difficult to interpret externally. In the same spirit, I
removed oldest_offset: as we discussed, it is internal and does not
provide an actionable signal to users.
If this addresses the concerns raised so far, I would appreciate
consideration in moving v11 forward for commit.
On Mon, Nov 10, 2025 at 12:13 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Nov 5, 2025 at 6:43 AM Naga Appani <nagnrik@gmail.com> wrote:
Understanding
============
Based on reading the relevant parts of multixact.c and observing the runtime
behavior, both approaches seem to run into limitations when trying to derive a
“remaining members” value outside the backend. I may be missing details, but the
behavior I observed suggests that a reliable computation might require
duplicating
several internal mechanisms, including:
- wrap-aware offset comparison
- SLRU page and segment alignment rules
- SetOffsetVacuumLimit’s segment recalculationWithout accounting for those, the derived numbers behaved inconsistently across
tests, sometimes staying at 0 until a large jump, and in other cases increasing
between exhaustion cycles. This seems broadly consistent with your concern that
simple arithmetic on these counters does not match how the backend determines
wraparound risk.To be clear, this interpretation is based only on what I could infer from the
code and testing, and I may not be capturing the entire picture. But from what I
observed, a user-visible “remaining members” metric does not seem
straightforward
without exposing or replicating backend logic.Right now MultiXactOffsetWouldWrap() assesses if the given distance is
higher than the permitted distance between start and boundary. I think
we could instead change it to report the permitted distance based on
start and boundary; use it to report remaining space (after
multiplying it with bytes per member) and also use it to assess
whether the required distance is within that boundary or whether we
need a warning. But ...
On Sat, Oct 18, 2025 at 4:48 PM Tomas Vondra <tomas@vondra.me> wrote:Thanks for working on this. I'm wondering if this is expected / could
help with monitoring for "space exhaustion" issues, which we currently
can't do easily, as it's not exposed anywhere.This is in multixact.c at line ~1177, where we do this:
if (MultiXactState->oldestOffsetKnown &&
MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
nextOffset, nmembers))
{
ereport(ERROR, ...
}But I'm not sure the current patch exposes enough information to
calculate how much space remains - calculating that we requires
offsetStopLimit and nextOffset.The function exposes the number of existing members and the amount of
space they consume (members_size). The documentation mentions space
related thresholds 10GB and 20GB. Isn't comparing members_size to
these thresholds enough to take appropriate action? If so, we could
report the difference between these respective thresholds and
members_size as a metric of space remaining before a given threshold
is triggered.
Best regards,
Naga
On Sat, Dec 6, 2025 at 11:23 PM Naga Appani <nagnrik@gmail.com> wrote:
Hi Ashutosh,
Thanks for the review!
I agree - comparing the exposed members_size against the documented
thresholds is sufficient for monitoring purposes.This aligns with the approach taken in v11: exposing the current usage in
a way consistent with other PostgreSQL counters (e.g., XIDs, OIDs), without
introducing user-visible remaining-capacity calculations whose behavior is
inconsistent and difficult to interpret externally. In the same spirit, I
removed oldest_offset: as we discussed, it is internal and does not
provide an actionable signal to users.If this addresses the concerns raised so far, I would appreciate
consideration in moving v11 forward for commit.
The patch at [1] changes the function used to fetch mxid related
information. With that we will get rid of awkwardness around
non-availability of the statistics. It's better to wait for those
changes to get committed before moving this forward.
--
Best Wishes,
Ashutosh Bapat
Thank you, Ashutosh!
On Sun, Dec 7, 2025 at 10:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
The patch at [1] changes the function used to fetch mxid related
information. With that we will get rid of awkwardness around
non-availability of the statistics. It's better to wait for those
changes to get committed before moving this forward.
Following the upstream change from Heikki's patch [0]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bd8d9c9bdfa0c2168bb37edca6fa88168cacbbaa, I've updated
the patch (v12) to align with the new behavior.
Code changes:
- GetMultiXactInfo() now returns void, so the conditional checks and NULL
handling have been removed.
- MultiXactOffset is now 64-bit; updated the code to use Int64GetDatum()
for member counts.
- Switched to using MULTIXACT_MEMBERGROUP_SIZE and
MULTIXACT_MEMBERS_PER_MEMBERGROUP from multixact_internal.h instead of
hardcoded calculations.
Documentation changes:
- Removed the NULL-return discussion from func-info.sgml, as the
statistics are now always available.
- Updated maintenance.sgml to clarify that exceeding the historical
2^32 member limit no longer causes wraparound, but instead triggers
more aggressive vacuum activity for disk space management.
I validated the behavior before and after cleanup.
The function correctly reports current usage (beyond the old limits) and
resets once multixacts are removed:
postgres=# SELECT num_mxids, num_members, pg_size_pretty(members_size)
AS members_size, oldest_multixact FROM pg_get_multixact_stats();
-[ RECORD 1 ]------------------
num_mxids | 267969541
num_members | 9469693355
members_size | 44 GB
oldest_multixact | 2
postgres=# SELECT pg_terminate_backend(27222);
pg_terminate_backend
----------------------
t
postgres=# SELECT num_mxids, num_members, pg_size_pretty(members_size)
AS members_size, oldest_multixact FROM pg_get_multixact_stats();
-[ RECORD 1 ]------------------
num_mxids | 0
num_members | 0
members_size | 0 bytes
oldest_multixact | 267969543
The updated patch is attached.
Regards,
Naga
Attachments:
v12-0001-Add-pg_get_multixact_stats-function-for-monitori.patchapplication/octet-stream; name=v12-0001-Add-pg_get_multixact_stats-function-for-monitori.patchDownload
From 54b1dabf3c8da43d700a1087307177a2f17e62ca Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Fri, 12 Dec 2025 22:44:46 +0000
Subject: [PATCH v12] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Returns NULLs if MultiXact system not initialized
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 27 +++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 45 +++++++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 89 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 113 ++++++++++++++++++
7 files changed, 337 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index d4508114a48..051c3b28985 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,33 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs currently present in the system,
+ <literal>num_members</literal> is the total number of multixact member entries currently
+ present in the system,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 08e6489afb8..8695b92e93e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. At approximately 2^32 entries
+ (occupying roughly 20GB in the <literal>pg_multixact/members</literal> directory), even
+ more aggressive vacuum scans are triggered to reclaim member storage space.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | members_size_pretty
+-----------+-------------+--------------+------------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index a428e140bc4..c0597cf5425 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -15,6 +15,8 @@
#include "postgres.h"
#include "access/multixact.h"
+#include "access/multixact_internal.h"
+#include "access/htup_details.h"
#include "funcapi.h"
#include "utils/builtins.h"
@@ -85,3 +87,46 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[4];
+ bool nulls[4];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset);
+
+ /*
+ * Calculate storage space for members. Members are stored in groups,
+ * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members
+ * and taking MULTIXACT_MEMBERGROUP_SIZE bytes.
+ */
+ membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+ MULTIXACT_MEMBERGROUP_SIZE;
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b9..6caea6c8281 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12612,4 +12612,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..27a6510c4ad
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,89 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |
+is_members_nondec_01 |
+is_members_nondec_12 |
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 112f05a3677..67f0078d8ba 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -119,3 +119,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..7b034654504
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,113 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second session locked the row,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock on the same tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.47.3
On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
Documentation changes:
- Removed the NULL-return discussion from func-info.sgml, as the
statistics are now always available.
- Updated maintenance.sgml to clarify that exceeding the historical
2^32 member limit no longer causes wraparound, but instead triggers
more aggressive vacuum activity for disk space management.I validated the behavior before and after cleanup.
The function correctly reports current usage (beyond the old limits) and
resets once multixacts are removed:
+ /*
+ * Calculate storage space for members. Members are stored in groups,
+ * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members
+ * and taking MULTIXACT_MEMBERGROUP_SIZE bytes.
+ */
+ membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+ MULTIXACT_MEMBERGROUP_SIZE;
This is the key point of the patch internal logic. And there is one
thing that I am wondering here. The amount of space taken by a number
of members depends on the other compiled constants from
multixact_internal.h. Hence, rather than calculate the amount of
space taken by a set of members in some code hidden in the SQL
function, could it be better to put that directly as a macro or an
inline function in multixact_internal.h?
--
Michael
On Wed, Dec 17, 2025 at 9:49 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
Documentation changes:
- Removed the NULL-return discussion from func-info.sgml, as the
statistics are now always available.
- Updated maintenance.sgml to clarify that exceeding the historical
2^32 member limit no longer causes wraparound, but instead triggers
more aggressive vacuum activity for disk space management.I validated the behavior before and after cleanup.
The function correctly reports current usage (beyond the old limits) and
resets once multixacts are removed:+ /* + * Calculate storage space for members. Members are stored in groups, + * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members + * and taking MULTIXACT_MEMBERGROUP_SIZE bytes. + */ + membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) * + MULTIXACT_MEMBERGROUP_SIZE;This is the key point of the patch internal logic. And there is one
thing that I am wondering here. The amount of space taken by a number
of members depends on the other compiled constants from
multixact_internal.h. Hence, rather than calculate the amount of
space taken by a set of members in some code hidden in the SQL
function, could it be better to put that directly as a macro or an
inline function in multixact_internal.h?
+1.
--
Best Wishes,
Ashutosh Bapat
Thanks for the suggestion and review, Michael and Ashutosh!
On Tue, Dec 16, 2025 at 11:17 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Dec 17, 2025 at 9:49 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Dec 13, 2025 at 01:34:47PM -0600, Naga Appani wrote:
I validated the behavior before and after cleanup.
The function correctly reports current usage (beyond the old limits) and
resets once multixacts are removed:+ /* + * Calculate storage space for members. Members are stored in groups, + * with each group containing MULTIXACT_MEMBERS_PER_MEMBERGROUP members + * and taking MULTIXACT_MEMBERGROUP_SIZE bytes. + */ + membersBytes = (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) * + MULTIXACT_MEMBERGROUP_SIZE;This is the key point of the patch internal logic. And there is one
thing that I am wondering here. The amount of space taken by a number
of members depends on the other compiled constants from
multixact_internal.h. Hence, rather than calculate the amount of
space taken by a set of members in some code hidden in the SQL
function, could it be better to put that directly as a macro or an
inline function in multixact_internal.h?+1.
--
Best Wishes,
Ashutosh Bapat
I’ve updated the patch as suggested.
The member storage size calculation has been refactored into a static
inline function, MultiXactMemberStorageSize(), in
src/include/access/multixact_internal.h.
Please find v13 attached.
Regards,
Naga
Attachments:
v13-0001-Add-pg_get_multixact_stats-function-for-monitori.patchapplication/octet-stream; name=v13-0001-Add-pg_get_multixact_stats-function-for-monitori.patchDownload
From 5e198efdb916548c0852d429e62b89ef918d7869 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Wed, 24 Dec 2025 21:06:16 +0000
Subject: [PATCH v13] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
doc/src/sgml/func/func-info.sgml | 27 +++++
doc/src/sgml/maintenance.sgml | 58 ++++++++-
src/backend/utils/adt/multixactfuncs.c | 39 ++++++
src/include/access/multixact_internal.h | 8 ++
src/include/catalog/pg_proc.dat | 10 ++
.../isolation/expected/multixact_stats.out | 89 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact_stats.spec | 113 ++++++++++++++++++
8 files changed, 339 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/multixact_stats.out
create mode 100644 src/test/isolation/specs/multixact_stats.spec
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index d4508114a48..051c3b28985 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,33 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs currently present in the system,
+ <literal>num_members</literal> is the total number of multixact member entries currently
+ present in the system,
+ <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+ in the <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may vary between calls,
+ even within a single transaction.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 08e6489afb8..8695b92e93e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,60 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
- scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2^31 entries
+ (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+ aggressive vacuum scans will occur more often for all tables, starting with those that
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. At approximately 2^32 entries
+ (occupying roughly 20GB in the <literal>pg_multixact/members</literal> directory), even
+ more aggressive vacuum scans are triggered to reclaim member storage space.
</para>
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | members_size_pretty
+-----------+-------------+--------------+------------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+ By leveraging this information, the function helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+ multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+ causing lock contention.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Track multixact cleanup efficiency by monitoring oldest_multixact.
+ If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+ that long-running transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
+ </para>
+
<para>
Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index a428e140bc4..e8929f4d83f 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -15,6 +15,8 @@
#include "postgres.h"
#include "access/multixact.h"
+#include "access/multixact_internal.h"
+#include "access/htup_details.h"
#include "funcapi.h"
#include "utils/builtins.h"
@@ -85,3 +87,40 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[4];
+ bool nulls[4];
+ MultiXactOffset members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset);
+
+ membersBytes = MultiXactMemberStorageSize(members);
+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ memset(nulls, false, sizeof(nulls));
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+}
diff --git a/src/include/access/multixact_internal.h b/src/include/access/multixact_internal.h
index f2d6539e8a6..03bcf35ef79 100644
--- a/src/include/access/multixact_internal.h
+++ b/src/include/access/multixact_internal.h
@@ -121,4 +121,12 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
+/* Calculate storage space in bytes for a given number of members */
+static inline int64
+MultiXactMemberStorageSize(MultiXactOffset members)
+{
+ return (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+ MULTIXACT_MEMBERGROUP_SIZE;
+}
+
#endif /* MULTIXACT_INTERNAL_H */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b9..6caea6c8281 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12612,4 +12612,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..27a6510c4ad
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,89 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |
+is_members_nondec_01 |
+is_members_nondec_12 |
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f2e067b1fbc..5ba44d67a10 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -121,3 +121,4 @@ test: serializable-parallel-2
test: serializable-parallel-3
test: matview-write-skew
test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..7b034654504
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,113 @@
+# Test invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second session locked the row,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+#
+# Terminology (global counters):
+# num_mxids, num_members : "in-use" deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can't advance past our pinned multi).
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock on the same tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0→snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
--
2.47.3
On Wed, Dec 24, 2025 at 06:09:14PM -0600, Naga Appani wrote:
I’ve updated the patch as suggested.
The member storage size calculation has been refactored into a static
inline function, MultiXactMemberStorageSize(), in
src/include/access/multixact_internal.h.Please find v13 attached.
Seems basically sensible here for the structure, including the hints
and recommendations for the GUCs.
+ of multixact member entries created exceeds approximately 2^31 entries
[...]
+ This output shows a system with significant multixact activity: about ~312 million
+ multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
The documentation could be improved more. The power '^' and tilde
symbols are not used for references. If any, I'd encourage using
wordings like "2 billion" entries for all these paragraphs across the
board. For the tilde part, you would mean "at least" or "at most"
rather than the boundaries implied with the tilde (aka we should not
expect the reader the mental effort to translate and understand what
these symbols mean, especially for non-native English speaker).
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
Saying that, this paragraph does not seem that useful to me,
especially the last sentence which is evasive and can apply to
anything related to monitoring.
The second hint is more useful, but perhaps we should mention which
GUC(s) should be touched to make num_members go lower? As a whole,
the orderedlist does not seem strongly necessary to me: the third
item is evasive, the first and second items describe problematic
patterns and what could cause them. As a whole, for the docs
part, the new additions in the existing paragraph of maintenance.sgml
are OK for me. The first part of the new paragraph added also
provides some direct information about how useful this new function is
to evaluate the amount of disk space used. I'd like to think that we
should just complete it the two facts about num_mxids and num_members
you are listing, with two sentences appended at the end of the new
paragraph rather than a list of items.
If we don't completely agree about the "hint" part, we could split the
patch in two for now: let's add the function first, then discuss more
about what kind of tweaks and patterns we want to document as a set of
follow-up changes. It does not change the fact that the function is
useful for disk-space monitoring purposes. The patterns and hints are
a second different matter.
--
Michael
On Thu, Dec 25, 2025 at 09:45:33AM +0900, Michael Paquier wrote:
Seems basically sensible here for the structure, including the hints
and recommendations for the GUCs.
+/* Calculate storage space in bytes for a given number of members */
+static inline int64
+MultiXactMemberStorageSize(MultiXactOffset members)
+{
+ return (int64) (members / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+ MULTIXACT_MEMBERGROUP_SIZE;
+}
By the way, this bit also feels a bit confusing, and this comes down
to the fact that "members" is not an offset, isn't it? This relates
to MultiXactMemberFreezeThreshold(), that considers the number of
members as an offset, but it is a number of members, a difference
between two offsets.
I am wondering if it would not be cleaner and less confusing to do
things slightly differently (sorry I did not pay much attention to
that previously):
- Change GetMultiXactInfo() to return two offsets, nextOffset and
oldestOffset.
- Use uint64 for members and recalculate the difference in
MultiXactMemberFreezeThreshold() and the function code. Heikki has
just switched multixact offsets to be 64 bits, yippee.
- Redefine MultiXactMemberStorageSize() so as it does not take a
number of members in input, but as the amount of space taken between
two offsets. At least that would be more consistent with all the
other inline functions of multixact.h that rely on MultiXactOffset
inputs. Using a int64 is still OK I guess, there may be a case to
detect "negative" numbers and give a change to the users of the new
inline function to notice that they did a computation wrong, rather
than hiding a signedness problem.
--
Michael
On Thu, Dec 25, 2025 at 10:30:37AM +0900, Michael Paquier wrote:
I am wondering if it would not be cleaner and less confusing to do
things slightly differently (sorry I did not pay much attention to
that previously):
- Change GetMultiXactInfo() to return two offsets, nextOffset and
oldestOffset.
- Use uint64 for members and recalculate the difference in
MultiXactMemberFreezeThreshold() and the function code. Heikki has
just switched multixact offsets to be 64 bits, yippee.
- Redefine MultiXactMemberStorageSize() so as it does not take a
number of members in input, but as the amount of space taken between
two offsets. At least that would be more consistent with all the
other inline functions of multixact.h that rely on MultiXactOffset
inputs. Using a int64 is still OK I guess, there may be a case to
detect "negative" numbers and give a change to the users of the new
inline function to notice that they did a computation wrong, rather
than hiding a signedness problem.
So, here is what I have in mind, split into independent pieces:
- Remove the existing type confusion with GetMultiXactInfo(), due to
how things have always been done in MultiXactMemberFreezeThreshold().
- Add macro MultiXactOffsetStorageSize(), to calculate the amount of
space used between two offsets.
- The main patch, with adjustments in comments, the test (no
non-ASCII characters in that, please). One thing that was really
surprising is that you did not consider ROLE_PG_READ_ALL_STATS. We
expect all the stats information to be hidden if a role is not granted
access to them, and this function should be no exception especially as
it relates to disk space usage like database or tablespace size
functions.
Anyway, attached are all these updated pieces. The doc edits are what
I have mentioned upthread, close to what you have suggested to me
offline.
Comments?
--
Michael
Attachments:
v14-0001-Rework-GetMultiXactInfo.patchtext/x-diff; charset=us-asciiDownload
From dbe13b3d61c03f5d8d6773021e87c626ab04b3b0 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 29 Dec 2025 11:39:56 +0900
Subject: [PATCH v14 1/3] Rework GetMultiXactInfo()
This routine returned a number of offsets as a MultiXactOffset, but it
is not actually an offset, just a number to define their range. This
was confusing.
This type confusion comes from the original implementation of
MultiXactMemberFreezeThreshold().
---
src/include/access/multixact.h | 2 +-
src/backend/access/transam/multixact.c | 16 ++++++++--------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 6433fe163641..d22abbb72512 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,7 +109,7 @@ extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
bool from_pgupgrade, bool isLockOnly);
-extern void GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
+extern void GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *nextOffset,
MultiXactId *oldestMultiXactId,
MultiXactOffset *oldestOffset);
extern bool MultiXactIdPrecedes(MultiXactId multi1, MultiXactId multi2);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 34956a5a6634..0d6f594e2a06 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2461,25 +2461,23 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
*
* Returns information about the current MultiXact state, as of:
* multixacts: Number of MultiXacts (nextMultiXactId - oldestMultiXactId)
- * members: Number of member entries (nextOffset - oldestOffset)
+ * nextOffset: Next-to-be-assigned offset
* oldestMultiXactId: Oldest MultiXact ID still in use
* oldestOffset: Oldest offset still in use
*/
void
-GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
+GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *nextOffset,
MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset)
{
- MultiXactOffset nextOffset;
MultiXactId nextMultiXactId;
LWLockAcquire(MultiXactGenLock, LW_SHARED);
- nextOffset = MultiXactState->nextOffset;
+ *nextOffset = MultiXactState->nextOffset;
*oldestMultiXactId = MultiXactState->oldestMultiXactId;
nextMultiXactId = MultiXactState->nextMXact;
*oldestOffset = MultiXactState->oldestOffset;
LWLockRelease(MultiXactGenLock);
- *members = nextOffset - *oldestOffset;
*multixacts = nextMultiXactId - *oldestMultiXactId;
}
@@ -2514,16 +2512,18 @@ GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members,
int
MultiXactMemberFreezeThreshold(void)
{
- MultiXactOffset members;
uint32 multixacts;
uint32 victim_multixacts;
double fraction;
int result;
MultiXactId oldestMultiXactId;
MultiXactOffset oldestOffset;
+ MultiXactOffset nextOffset;
+ uint64 members;
- /* Read the current offsets and members usage. */
- GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset);
+ /* Read the current offsets and multixact usage. */
+ GetMultiXactInfo(&multixacts, &nextOffset, &oldestMultiXactId, &oldestOffset);
+ members = nextOffset - oldestOffset;
/* If member space utilization is low, no special action is required. */
if (members <= MULTIXACT_MEMBER_LOW_THRESHOLD)
--
2.51.0
v14-0002-Add-MultiXactOffsetStorageSize.patchtext/x-diff; charset=us-asciiDownload
From 4ef3b82b91eaee132cfed886f6a1a1455bb084a3 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 29 Dec 2025 11:41:35 +0900
Subject: [PATCH v14 2/3] Add MultiXactOffsetStorageSize()
This calculates the amount of space taken by two multixact offsets,
useful on its own to know the amount of space multixacts may use. This
will be used by an upcoming patch.
---
src/include/access/multixact_internal.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/src/include/access/multixact_internal.h b/src/include/access/multixact_internal.h
index f2d6539e8a67..65dc2be148db 100644
--- a/src/include/access/multixact_internal.h
+++ b/src/include/access/multixact_internal.h
@@ -121,4 +121,14 @@ MXOffsetToMemberOffset(MultiXactOffset offset)
member_in_group * sizeof(TransactionId);
}
+/* Storage space consumed by a range of offsets, in bytes */
+static inline int64
+MultiXactOffsetStorageSize(MultiXactOffset new_offset,
+ MultiXactOffset old_offset)
+{
+ Assert(new_offset >= old_offset);
+ return (int64) ((new_offset - old_offset) / MULTIXACT_MEMBERS_PER_MEMBERGROUP) *
+ MULTIXACT_MEMBERGROUP_SIZE;
+}
+
#endif /* MULTIXACT_INTERNAL_H */
--
2.51.0
v14-0003-Add-pg_get_multixact_stats-function-for-monitori.patchtext/x-diff; charset=utf-8Download
From a34c851f3421d5023f896f28549e77fd6d760100 Mon Sep 17 00:00:00 2001
From: Naga Appani <nagnrik@gmail.com>
Date: Wed, 24 Dec 2025 21:06:16 +0000
Subject: [PATCH v14 3/3] Add pg_get_multixact_stats() function for monitoring
MultiXact usage
Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids : number of MultiXact IDs in use
- num_members : number of member entries in use
- members_size : bytes used by num_members in pg_multixact/members directory
- oldest_multixact : oldest MultiXact ID still needed
This patch adds pg_get_multixact_stats() function
- SQL-callable interface to GetMultiXactInfo()
- Includes isolation tests for monitoring invariants
Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage
Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
src/include/catalog/pg_proc.dat | 10 ++
src/backend/utils/adt/multixactfuncs.c | 53 +++++++++
.../isolation/expected/multixact-stats.out | 89 ++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/multixact-stats.spec | 111 ++++++++++++++++++
src/test/regress/expected/misc_functions.out | 29 +++++
src/test/regress/sql/misc_functions.sql | 15 +++
doc/src/sgml/func/func-info.sgml | 33 ++++++
doc/src/sgml/maintenance.sgml | 39 +++++-
9 files changed, 375 insertions(+), 5 deletions(-)
create mode 100644 src/test/isolation/expected/multixact-stats.out
create mode 100644 src/test/isolation/specs/multixact-stats.spec
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fd9448ec7b98..6caea6c8281e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12612,4 +12612,14 @@
proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
prosrc => 'pg_get_aios' },
+# Get multixact usage
+{ oid => '9001', descr => 'get current multixact usage statistics',
+ proname => 'pg_get_multixact_stats',
+ provolatile => 'v', proparallel => 's', prorettype => 'record',
+ proargtypes => '',
+ proallargtypes => '{int8,int8,int8,xid}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{num_mxids,num_members,members_size,oldest_multixact}',
+ prosrc => 'pg_get_multixact_stats'},
+
]
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index a428e140bc4b..b39db200a391 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -15,7 +15,12 @@
#include "postgres.h"
#include "access/multixact.h"
+#include "access/multixact_internal.h"
+#include "access/htup_details.h"
+#include "catalog/pg_authid_d.h"
#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/acl.h"
#include "utils/builtins.h"
/*
@@ -85,3 +90,51 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
SRF_RETURN_DONE(funccxt);
}
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current multixact usage.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+ TupleDesc tupdesc;
+ Datum values[4];
+ bool nulls[4];
+ uint64 members;
+ MultiXactId oldestMultiXactId;
+ uint32 multixacts;
+ MultiXactOffset oldestOffset;
+ MultiXactOffset nextOffset;
+ int64 membersBytes;
+
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("return type must be a row type")));
+
+ GetMultiXactInfo(&multixacts, &nextOffset, &oldestMultiXactId, &oldestOffset);
+ members = nextOffset - oldestOffset;
+
+ membersBytes = MultiXactOffsetStorageSize(nextOffset, oldestOffset);
+
+ if (!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+ {
+ /*
+ * Only superusers and roles with privileges of pg_read_all_stats can
+ * see details.
+ */
+ memset(nulls, true, sizeof(bool) * tupdesc->natts);
+ }
+ else
+ {
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = Int64GetDatum(members);
+ values[2] = Int64GetDatum(membersBytes);
+ values[3] = UInt32GetDatum(oldestMultiXactId);
+ memset(nulls, false, sizeof(nulls));
+ }
+
+ return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+}
diff --git a/src/test/isolation/expected/multixact-stats.out b/src/test/isolation/expected/multixact-stats.out
new file mode 100644
index 000000000000..27a6510c4ad5
--- /dev/null
+++ b/src/test/isolation/expected/multixact-stats.out
@@ -0,0 +1,89 @@
+Parsed test spec with 2 sessions
+
+starting permutation: snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
+step snap0:
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap1:
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+ 1
+(1 row)
+
+step snap2:
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+
+step check_while_pinned:
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+
+assertion |ok
+------------------------+--
+is_init_mxids |t
+is_init_members |t
+is_init_oldest_mxid |t
+is_init_oldest_off |t
+is_oldest_mxid_nondec_01|t
+is_oldest_mxid_nondec_12|t
+is_oldest_off_nondec_01 |t
+is_oldest_off_nondec_12 |t
+is_members_increased_ge1|t
+is_mxids_nondec_01 |t
+is_mxids_nondec_12 |
+is_members_nondec_01 |
+is_members_nondec_12 |
+(13 rows)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f2e067b1fbc5..01ff1c6586fe 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -63,6 +63,7 @@ test: delete-abort-savept-2
test: aborted-keyrevoke
test: multixact-no-deadlock
test: multixact-no-forget
+test: multixact-stats
test: lock-committed-update
test: lock-committed-keyupdate
test: update-locked-tuple
diff --git a/src/test/isolation/specs/multixact-stats.spec b/src/test/isolation/specs/multixact-stats.spec
new file mode 100644
index 000000000000..6c1dd94958d1
--- /dev/null
+++ b/src/test/isolation/specs/multixact-stats.spec
@@ -0,0 +1,111 @@
+# Test for pg_get_multixact_stats()
+#
+# We create exactly one fresh MultiXact on a brand-new table. While it is
+# pinned by two open transactions, we check patterns of this function that
+# VACUUM/FREEZE cannot violate:
+# 1) "members" increased by ≥ 1 when the second session locked the row,
+# 2) (num_mxids / num_members) did not decrease compared to earlier snapshots
+# 3) "oldest_*" fields never decreases.
+#
+# This test does not do checks patterns after releasing locks, as freezing
+# and/or truncation may shrink the multixact ranges calculated.
+
+setup
+{
+ CREATE TABLE mxq(id int PRIMARY KEY, v int);
+ INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+ DROP TABLE mxq;
+}
+
+# Two sessions that lock the same tuple, leading to one MultiXact with
+# at least 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin { BEGIN; }
+step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin { BEGIN; }
+step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Save multixact state *BEFORE* any locking; some of these may be NULLs if
+# multixacts have not initialized yet.
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Save multixact state after s1 has locked the row.
+step snap1 {
+ CREATE TEMP TABLE snap1 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Save multixact state after s2 joins to lock the same row, leading to
+# a multixact with at least 2 members.
+step snap2 {
+ CREATE TEMP TABLE snap2 AS
+ SELECT num_mxids, num_members, oldest_multixact
+ FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value outputs based of boolean checks:
+# is_init_mxids : num_mxids is non-NULL
+# is_init_members : num_members is non-NULL
+# is_init_oldest_mxid : oldest_multixact is non-NULL
+# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0->snap1)
+# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1->snap2)
+# is_members_increased_ge1 : members increased by at least 1 when s2 joined
+# is_mxids_nondec_01 : num_mxids did not decrease (snap0->snap1)
+# is_mxids_nondec_12 : num_mxids did not decrease (snap1->snap2)
+# is_members_nondec_01 : num_members did not decrease (snap0->snap1)
+# is_members_nondec_12 : num_members did not decrease (snap1->snap2)
+step check_while_pinned {
+ SELECT r.assertion, r.ok
+ FROM snap0 s0
+ JOIN snap1 s1 ON TRUE
+ JOIN snap2 s2 ON TRUE,
+ LATERAL unnest(
+ ARRAY[
+ 'is_init_mxids',
+ 'is_init_members',
+ 'is_init_oldest_mxid',
+ 'is_init_oldest_off',
+ 'is_oldest_mxid_nondec_01',
+ 'is_oldest_mxid_nondec_12',
+ 'is_oldest_off_nondec_01',
+ 'is_oldest_off_nondec_12',
+ 'is_members_increased_ge1',
+ 'is_mxids_nondec_01',
+ 'is_mxids_nondec_12',
+ 'is_members_nondec_01',
+ 'is_members_nondec_12'
+ ],
+ ARRAY[
+ (s2.num_mxids IS NOT NULL),
+ (s2.num_members IS NOT NULL),
+ (s2.oldest_multixact IS NOT NULL),
+
+ (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+ (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+
+ (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
+ ) AS r(assertion, ok);
+}
+
+permutation snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned s1_commit s2_commit
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d7d965d884a1..6c03b1a79d75 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -999,3 +999,32 @@ SELECT test_relpath();
SELECT pg_replication_origin_create('regress_' || repeat('a', 505));
ERROR: replication origin name is too long
DETAIL: Replication origin names must be no longer than 512 bytes.
+-- pg_get_multixact_stats tests
+CREATE ROLE regress_multixact_funcs;
+-- Access granted for superusers.
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+ null_result
+-------------
+ f
+(1 row)
+
+-- Access revoked.
+SET ROLE regress_multixact_funcs;
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+ null_result
+-------------
+ t
+(1 row)
+
+RESET ROLE;
+-- Access granted for users with pg_monitor rights.
+GRANT pg_monitor TO regress_multixact_funcs;
+SET ROLE regress_multixact_funcs;
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+ null_result
+-------------
+ f
+(1 row)
+
+RESET ROLE;
+DROP ROLE regress_multixact_funcs;
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 0fc20fbb6b40..35b7983996c4 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -459,3 +459,18 @@ SELECT test_relpath();
-- pg_replication_origin.roname limit
SELECT pg_replication_origin_create('regress_' || repeat('a', 505));
+
+-- pg_get_multixact_stats tests
+CREATE ROLE regress_multixact_funcs;
+-- Access granted for superusers.
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+-- Access revoked.
+SET ROLE regress_multixact_funcs;
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+RESET ROLE;
+-- Access granted for users with pg_monitor rights.
+GRANT pg_monitor TO regress_multixact_funcs;
+SET ROLE regress_multixact_funcs;
+SELECT oldest_multixact IS NULL AS null_result FROM pg_get_multixact_stats();
+RESET ROLE;
+DROP ROLE regress_multixact_funcs;
diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index d4508114a48e..175f18315cd4 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,39 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres}
modify key columns.
</para></entry>
</row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_get_multixact_stats</primary>
+ </indexterm>
+ <function>pg_get_multixact_stats</function> ()
+ <returnvalue>record</returnvalue>
+ ( <parameter>num_mxids</parameter> <type>integer</type>,
+ <parameter>num_members</parameter> <type>bigint</type>,
+ <parameter>members_size</parameter> <type>bigint</type>,
+ <parameter>oldest_multixact</parameter> <type>xid</type> )
+ </para>
+ <para>
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the total number of multixact IDs
+ currently present in the system, <literal>num_members</literal> is
+ the total number of multixact member entries currently present in
+ the system, <literal>members_size</literal> is the storage occupied
+ by <literal>num_members</literal> in the
+ <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
+ in use.
+ </para>
+ <para>
+ The function reports statistics at the time it is invoked. Values may
+ vary between calls, even within a single transaction.
+ </para>
+ <para>
+ To use this function, you must have privileges of the
+ <literal>pg_read_all_stats</literal> role.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 08e6489afb8e..7c958b062731 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,41 @@ HINT: Execute a database-wide VACUUM in that database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
- linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the
- storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of multixact member entries created exceeds approximately 2 billion
+ entries (occupying roughly 10GB in the
+ <literal>pg_multixact/members</literal> directory), aggressive vacuum
scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
- scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ have the oldest multixact-age. Both of these kinds of aggressive
+ scans will occur even if autovacuum is nominally disabled. At approximately
+ 4 billion entries (occupying roughly 20GB in the
+ <literal>pg_multixact/members</literal> directory), even more aggressive
+ vacuum scans are triggered to reclaim member storage space.
+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function described in
+ <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+ multixact allocation and usage patterns in real time, for example:
+ <programlisting>
+=# SELECT *, pg_size_pretty(members_size) members_size_pretty
+ FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | members_size_pretty
+-----------+-------------+--------------+------------------+---------------------
+ 311740299 | 2785241176 | 13926205880 | 2 | 13 GB
+(1 row)
+ </programlisting>
+ This output shows a system with significant multixact activity: about
+ 312 million multixact IDs and about 2.8 billion member entries consuming
+ 13 GB of storage space.
+ A spike in <literal>num_mxids</literal> might indicate multiple sessions
+ running <literal>UPDATE</literal> statements with foreign key checks,
+ concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent
+ use of savepoints causing lock contention.
+ If <literal>oldest_multixact</literal> value remains unchanged while
+ <literal>num_members</literal> grows, it could indicate that long-running
+ transactions are preventing cleanup, or autovacuum is
+ not keeping up with the workload.
</para>
<para>
--
2.51.0
On Sun, Dec 28, 2025 at 9:51 PM Michael Paquier <michael@paquier.xyz> wrote:
So, here is what I have in mind, split into independent pieces:
- Remove the existing type confusion with GetMultiXactInfo(), due to
how things have always been done in MultiXactMemberFreezeThreshold().
- Add macro MultiXactOffsetStorageSize(), to calculate the amount of
space used between two offsets.
- The main patch, with adjustments in comments, the test (no
non-ASCII characters in that, please). One thing that was really
surprising is that you did not consider ROLE_PG_READ_ALL_STATS. We
expect all the stats information to be hidden if a role is not granted
access to them, and this function should be no exception especially as
it relates to disk space usage like database or tablespace size
functions.Anyway, attached are all these updated pieces. The doc edits are what
I have mentioned upthread, close to what you have suggested to me
offline.Comments?
--
Michael
Thank you for patches, Michael! I've tested and everything works well:
- All patches apply cleanly
- Isolation test (multixact-stats) passes
- Function correctly reports stats under heavy load
Tested with significant multixact activity:
++++++++++++++++++++++++++++++++++++++++++++++
postgres=# \x
Expanded display is on.
postgres=# SELECT
to_char(num_mxids::bigint, 'FM999,999,999,999') AS num_mxids,
to_char(num_members::bigint, 'FM999,999,999,999') AS num_members,
to_char(members_size::bigint, 'FM999,999,999,999') AS members_size_bytes,
pg_size_pretty(members_size) AS members_size_pretty,
to_char(oldest_multixact::text::bigint, 'FM999,999,999,999') AS
oldest_multixact
FROM pg_get_multixact_stats();
-[ RECORD 1 ]-------+------------------
num_mxids | 235,095,556
num_members | 14,435,701,862
members_size_bytes | 72,178,509,300
members_size_pretty | 67 GB
oldest_multixact | 2
++++++++++++++++++++++++++++++++++++++++++++++
After cleanup, the function properly resets:
++++++++++++++++++++++++++++++++++++++++++++++
-[ RECORD 1 ]-------+-------------
num_mxids | 0
num_members | 0
members_size_bytes | 0
members_size_pretty | 0 bytes
oldest_multixact | 235,095,558
++++++++++++++++++++++++++++++++++++++++++++++
The oldest_multixact correctly advances to reflect the cleanup.
Thanks for adding the pg_read_all_stats privilege check!
I think this is ready for RFC.
Best regards,
Naga
On Mon, Dec 29, 2025 at 08:57:11PM -0600, Naga Appani wrote:
The oldest_multixact correctly advances to reflect the cleanup.
Thanks for adding the pg_read_all_stats privilege check!
I think this is ready for RFC.
Thanks for looking. I have done an extra round of brush-up, then
applied the set. The buildfarm looks OK with it.
--
Michael