[Proposal] Expose internal MultiXact member count function for efficient monitoring

Started by Naga Appaniabout 1 year ago42 messageshackers
Jump to latest
#1Naga Appani
nagnrik@gmail.com

Hi,

I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts() to allow for efficient monitoring of MultiXact member
usage. This function provides an accurate, real-time view of MultiXact
activity by directly retrieving the actual member count, rather than
relying on storage-based calculations.

*Current Challenges: *The existing approach we are currently using to
estimate MultiXact member usage has several drawbacks:

- *Filesystem scanning overhead: *These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or millions
of files, and retrieving file sizes using stat() calls, which introduces
significant I/O overhead.
- *Potential performance bottleneck:* On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- *Not a real-time or memory-efficient solution:* The current approach
does not provide a direct, in-memory view of MultiXact activity.

*Proposed Solution*The internal ReadMultiXactCounts() function, implemented
in multixact.c, directly calculates the number of MultiXact members by
reading live state from shared memory. This approach avoids the performance
issues of the current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide
PostgreSQL users with an efficient way to monitor MultiXact member usage.
This could be particularly useful for integrating with tools like Amazon
RDS Performance Insights and Amazon CloudWatch to provide enhanced database
insights and proactive managed monitoring for users.

The performance comparison between the current and proposed approaches
shows a significant improvement, with the proposed solution taking only a
fraction of a millisecond to retrieve the MultiXact member count, compared
to tens or hundreds of milliseconds for the current filesystem-based
approach.

Following is the comparison of performance between calculating storage of
MultiXact members directory and retrieving the count of members.

Implementation Used size MultiXact members (approx) Time taken (ms) Time
factor (vs Baseline)
EC2 community (RDS version of pg_ls_multixactdir) 8642 MB 1.8 billion 96.879
1.00
Linux du command 8642 MB 1.8 billion 96 NA
Proposal (ReadMultiXactCounts) N/A 1.99 billion 0.167 580 times faster

I believe exposing ReadMultiXactCounts() would be a valuable addition to
the PostgreSQL ecosystem, providing users with a more reliable and
efficient way to monitor MultiXact usage. Appreciate your feedback or
discussion on this proposal.

Please let me know if this approach is acceptable, so I’ll go ahead and
submit a patch.

Thank you!

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

#2Naga Appani
nagnrik@gmail.com
In reply to: Naga Appani (#1)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi,

I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts() to allow for efficient monitoring of MultiXact
member usage. This function provides an accurate, real-time view of
MultiXact activity by directly retrieving the actual member count, rather
than relying on storage-based calculations.

*Current Challenges: *The existing approach we are currently using to
estimate MultiXact member usage has several drawbacks:

- *Filesystem scanning overhead: *These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or
millions of files, and retrieving file sizes using stat() calls, which
introduces significant I/O overhead.
- *Potential performance bottleneck:* On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- *Not a real-time or memory-efficient solution:* The current approach
does not provide a direct, in-memory view of MultiXact activity.

*Proposed Solution*The internal ReadMultiXactCounts() function,
implemented in multixact.c, directly calculates the number of MultiXact
members by reading live state from shared memory. This approach avoids the
performance issues of the current filesystem-based estimation methods.
................
...............

My apologies for re-posting. This is my first time writing to the hackers
list, and I accidentally used HTML formatting. Below is the original
request in plain text:

**************************************************************************************************************************************************************
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts()[1]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948 to allow for efficient monitoring of MultiXact
member usage. This function provides an accurate, real-time view of
MultiXact activity by directly retrieving the actual member count, rather
than relying on storage-based calculations.

================
Current Challenges
================
The existing approach we are currently using to estimate MultiXact member
usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or millions of
files, and retrieving file sizes using stat() calls, which introduces
significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does
not provide a direct, in-memory view of MultiXact activity.

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c,
directly calculates the number of MultiXact members by reading live state
from shared memory. This approach avoids the performance issues of the
current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide
PostgreSQL users with an efficient way to monitor MultiXact member usage.
This could be particularly useful for integrating with tools like Amazon
RDS Performance Insights and Amazon CloudWatch to provide enhanced database
insights and proactive managed monitoring for users.

=========================
Performance comparison
=========================
The performance comparison between the current and proposed approaches
shows a significant improvement, with the proposed solution taking only a
fraction of a millisecond to retrieve the MultiXact member count, compared
to tens or hundreds of milliseconds for the current filesystem-based
approach. And as more members are generated, the gap widens.

Following is the comparison of performance between calculating storage of
MultiXact members directory and retrieving the count of members.

Implementation | Used size |
MultiXact members
----------------------------------------------------+-------------+------------------
EC2 community (RDS version of pg_ls_multixactdir) | 8642 MB | 1.8
billion
Linux du command | 8642 MB | 1.8
billion
Proposal (ReadMultiXactCounts) | 8642 MB | 1.8
billion

============================================================================================
Sample runs
============================================================================================
Using "du -h"
--------------------
postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G /rdsdbdata/db/17.4/data/pg_multixact/members

real 0m0.285s <============================= time taken
user 0m0.050s <============================= time taken
sys 0m0.140s

Using RDS's pg_ls_multixactdir ():
------------------------------------------------------------
postgres=# SELECT
pg_size_pretty(coalesce(sum(size), 0)) AS members_size
FROM
pg_ls_multixactdir ()
WHERE
name LIKE 'pg_multixact/members%';
members_size
--------------
13 GB
(1 row)

Time: 226.533 ms <============================= time taken

Using proposed function:
----------------------------------------
postgres=# SELECT to_char(pg_get_multixact_members_count(),
'999,999,999,999') AS members_count;
members_count
------------------
2,745,823,171
(1 row)

Time: 0.142 ms <============================= time taken
============================================================================================

I believe exposing ReadMultiXactCounts() would be a valuable addition to
the PostgreSQL ecosystem, providing users with a more reliable and
efficient way to monitor MultiXact usage. Appreciate your feedback or
discussion on this proposal.

Please let me know if this approach is acceptable, so I’ll go ahead and
submit a patch.

Thank you!

References:
[1]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948
https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2925-L2948

Show quoted text

Thank you!

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

#3Kirill Reshke
reshkekirill@gmail.com
In reply to: Naga Appani (#2)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:

On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi,

Hi

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Let's give it a try!

--
Best regards,
Kirill Reshke

#4Naga Appani
nagnrik@gmail.com
In reply to: Kirill Reshke (#3)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:

On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi,

Hi

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Let's give it a try!

Hi,

As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.

While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise. The only
workaround today involves scanning the pg_multixact/members directory
on disk, current workaround uses stat() calls over potentially
millions of small segment files, adds I/O overhead, and is unsuitable
for periodic monitoring or integration into observability platforms.

Unlike the approach originally proposed or discussed in this thread,
this patch does not expose the internal ReadMultiXactCounts() function
directly. Instead, it wraps it internally (without changing its
visibility) to make the data available via a new SQL function.

This patch adds:

pg_get_multixact_count()

It returns a composite of:
- multixacts: number of MultiXact IDs that currently exist
- members: number of MultiXact member entries currently exist

Implementation
--------------
- Defined in multixact.c
- Calls ReadMultiXactCounts()
- Returns a composite record (multixacts, members)
- Includes documentation

Use cases
---------
This function enables users to:
- Monitor member usage to anticipate aggressive vacuum and avoid wraparound risk
- Track long-lived workloads that accumulate MultiXacts
- Power lightweight monitoring/diagnostics tools without scanning the filesystem
- Log and analyze MultiXact growth over time

Sample output
-------------
multixacts | members
------------+------------
182371396 | 2826221174
(1 row)

Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.

Implementation | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion |
96.879 | 1.00 (baseline)
Linux (du command) | 8642 MB | 1.8 billion |
96 | 1.00
Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion |
0.167 | ~580x faster

Documentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgml

To keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.

This patch aims to fill a long-standing observability gap.

Patch attached.

Best regards,
Naga Appani
Postgres Database Engineer
Amazon Web Services

Show quoted text

--
Best regards,
Kirill Reshke

Attachments:

v1-0001-Add-pg_get_multixact_count-function-and-related-d.patchapplication/octet-stream; name=v1-0001-Add-pg_get_multixact_count-function-and-related-d.patchDownload+111-3
#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#4)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Tue, Jun 10, 2025 at 7:50 PM Naga Appani <nagnrik@gmail.com> wrote:

On Tue, Mar 11, 2025 at 4:48 AM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 14:37, Naga Appani <nagnrik@gmail.com> wrote:

On Mon, Mar 10, 2025 at 10:43 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi,

Hi

=================
Proposal
=================
The internal ReadMultiXactCounts() function, implemented in multixact.c, directly calculates the number of MultiXact members by reading live state from shared memory. This approach avoids the performance issues of the current filesystem-based estimation methods.

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL users with an efficient way to monitor MultiXact member usage. This could be particularly useful for integrating with tools like Amazon RDS Performance Insights and Amazon CloudWatch to provide enhanced database insights and proactive managed monitoring for users.

Please let me know if this approach is acceptable, so I’ll go ahead and submit a patch.

Let's give it a try!

Hi,

As a follow-up, I’m submitting a patch that introduces a SQL-callable
function to retrieve MultiXact usage metrics. Although the motivation
has been discussed earlier in this thread, I’m including a brief recap
below to provide context for the patch itself.

While wraparound due to MultiXacts (MXID) is less frequent than XID
wraparound, it can still lead to aggressive/wraparound vacuum behavior
or downtime in certain workloads — especially those involving foreign
keys, shared row locks, or long-lived transactions. Currently, users
have no SQL-level visibility into MultiXact member consumption, which
makes it hard to proactively respond before issues arise.

I see mxid_age() will just give mxid consumption but not members
consumption. So just that function is not enough.

Sample output
-------------
multixacts | members
------------+------------
182371396 | 2826221174
(1 row)

Performance comparison
----------------------
While performance is not the primary motivation for this patch, it
becomes important in monitoring scenarios where frequent polling is
expected. The proposed function executes in sub-millisecond time and
avoids any filesystem I/O, making it well-suited for lightweight,
periodic monitoring.

Implementation | Used size | MultiXact members
| Time (ms) | Relative cost
-------------------------------------+-----------+-------------------+-----------+----------------
Community (pg_ls_multixactdir) | 8642 MB | 1.8 billion |
96.879 | 1.00 (baseline)
Linux (du command) | 8642 MB | 1.8 billion |
96 | 1.00
Proposal (ReadMultiXactCounts-based) | N/A | 1.99 billion |
0.167 | ~580x faster

Documentation
-------------
- A new section is added to func.sgml to group multixact-related functions
- A reference to this new function is included in the "Multixacts and
Wraparound" subsection of maintenance.sgml

To keep related functions grouped together, we can consider moving
mxid_age() into the new section as well unless there are objections to
relocating it from the current section.

In [1]/messages/by-id/aF8b_fp_9Va58vB9@nathan, we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.

+  <sect2 id="functions-info-multixact-information">
+   <title>MultiXact Information Functions</title>
+
+       <entry role="func_table_entry">
+        <para role="func_signature">
+         <indexterm><primary>pg_get_multixact_count</primary></indexterm>
+         <function>pg_get_multixact_count</function> ()
+         <returnvalue>record</returnvalue>
+        </para>
+        <para>
+         Returns a record with the fields
<structfield>multixacts</structfield> and
<structfield>members</structfield>:
+         <itemizedlist>
+          <listitem>
+           <para><structfield>multixacts</structfield>: Number of
MultiXacts assigned.
+            PostgreSQL initiates aggressive autovacuum when this
value grows beyond the threshold
+            defined by
<varname>autovacuum_multixact_freeze_max_age</varname>, which is based
on
+            the age of <literal>datminmxid</literal>. For more details, see
+            <ulink
url="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND">
+            Routine Vacuuming: Multixact Wraparound</ulink>.</para>
+          </listitem>
+          <listitem>
+           <para><structfield>members</structfield>: Number of
MultiXact member entries created.
+            These are stored in files under the
<filename>pg_multixact/members</filename> subdirectory.
+            Wraparound occurs after approximately 4.29 billion
entries (~20 GiB). PostgreSQL initiates
+            aggressive autovacuum when the number of members created
exceeds approximately 2.145 billion
+            or when storage consumption in
<filename>pg_multixact/members</filename> approaches 10 GiB.</para>
+          </listitem>
+         </itemizedlist>
+        </para>
+       </entry>

The description here doesn't follow the format of the other functions
in this section. We usually explain the inputs and outputs of the
function but not how to use the outputs. In this case, you might want
to just refer to Multixact Wraparound section under Routine Vacuuming
chapter rather than describing the autovacuum behaviour. You can do
that by inserting <xref linkend="vacuum-for-multixact-wraparound"/>
instead of a full URL. These links are appropriately resolved when
creating HTML to version specific links. The URL you have used will
always point to "Current" version.

+    <para>
+     The <function><link
linkend="functions-multixact-information">pg_get_multixact_count</link></function>
+     function provides a way to check how many multixacts and member
entries have been allocated. This can
+     be useful for identifying unusual multixact activity, monitoring
progress toward wraparound, anticipating
+     system-wide aggressive autovacuum as usage approaches critical
thresholds, or verifying whether autovacuum
+     is keeping up with demand.
+    </para>
+

This is the right place to go in details of how the function can be
used; not the function documentation itself. I am yet to make up
whether we need the whole description. I think the first line is
enough and goes well with the rest of the section.

+
+ if (!ReadMultiXactCounts(&multixacts, &members))
+ ereport(ERROR,
+ (errmsg("could not read multixact counts")));

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.

If ReadMultiXactCounts() returns false,
MultiXactMemberFreezeThreshold() returns 0, which will cause the
autovacuum to be more aggressive. I think it will be good to highlight
that in the function description since that's one of the objectives of
this function: to know when the autovacuum is going to be more
aggressive.

+
+ values[0] = UInt32GetDatum(multixacts);
+ values[1] = UInt32GetDatum(members);
+
+ tuple = heap_form_tuple(tupdesc, values, nulls);
+ PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
+}

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

The patch needs tests.

[1]: /messages/by-id/aF8b_fp_9Va58vB9@nathan

--
Best Wishes,
Ashutosh Bapat

#6Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#5)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Fri, Jul 25, 2025 at 04:27:37PM +0530, Ashutosh Bapat wrote:

In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.

Yep, let's be consistent.

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible. In
this case for example, we could throw a warning instead or report NULL
values.

Most likely returning NULL is the best thing we can do, as a safe
fallback.

The patch needs tests.

Indeed.

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c? The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h. The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h. Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

There should be no need for that, pg_proc.dat handling the
declaration AFAIK.

FWIW, these functions are always kind of hard to use for the end-user
without proper documentation. You may want to add an example of how
one can use it for monitoring in the docs.
--
Michael

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#6)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Jul 28, 2025 at 9:52 AM Michael Paquier <michael@paquier.xyz> wrote:

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c? The existing
pg_get_multixact_members() relies on GetMultiXactIdMembers(),
available in multixact.h. The new function pg_get_multixact_count()
relies on ReadMultiXactCounts(), which would mean adding it in
multixact.h. Even if we finish without an agreement about the SQL
function and the end, publishing ReadMultiXactCounts() would give an
access to the internals to external code.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

There should be no need for that, pg_proc.dat handling the
declaration AFAIK.

FWIW, these functions are always kind of hard to use for the end-user
without proper documentation. You may want to add an example of how
one can use it for monitoring in the docs.

+1.

Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem, they
would like to know which transactions are holding it back.
pg_get_multixact_members() can be used to get the members of the
oldest multixact if it's reported and then the user can deal with
those transactions. However, the oldest multixact is not reported
anywhere, AFAIK. It's also part of MultiXactState, so can be extracted
via ReadMultiXactCounts(). We could report it through
pg_get_multixact_counts - after renaming it and ReadMultiXactCounts to
pg_get_multixact_stats() and ReadMultiXactStats() respectively. Or we
could write another function to do so. But it comes handy using query
like below
#select oldestmultixact,
pg_get_multixact_members(oldestmultixact::text::xid) from
pg_get_multixact_count();
oldestmultixact | pg_get_multixact_members
------------------+--------------------------
1 | (757,sh)
1 | (768,sh)
(2 rows)

Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.

--
Best Wishes,
Ashutosh Bapat

Attachments:

oldest_multixact.patch.txttext/plain; charset=US-ASCII; name=oldest_multixact.patch.txtDownload+14-12
#8Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#7)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Hi Ashutosh, Michael,

Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.

On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.

Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.

The description here doesn't follow the format of the other functions
in this section.

Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible.

The function now returns NULL instead of throwing an error when counts
can’t be read.

If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...

Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156 and discussed in [1]/messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).

Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.

The patch needs tests.

Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.

On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem...
Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.

Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.

On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:

Yep, let's be consistent.

Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.

Most likely returning NULL is the best thing we can do, as a safe fallback.

Implemented -- the function now returns NULL if counts can’t be read.

The patch needs tests.

Isolation tests have been added as described above.

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?

I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

Removed -- now handled entirely by pg_proc.dat.

...You may want to add an example of how one can use it for monitoring in the docs.

I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.

References:
[0]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1]: /messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com

Best regards,
Naga Appani

Attachments:

v2-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchapplication/octet-stream; name=v2-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchDownload+233-30
#9Naga Appani
nagnrik@gmail.com
In reply to: Naga Appani (#8)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Aug 4, 2025 at 1:16 AM Naga Appani <nagnrik@gmail.com> wrote:

Hi Ashutosh, Michael,

Thanks for the detailed reviews. I have incorporated the feedback;
please find attached v2 and my responses inline below.

On Fri, Jul 25, 2025 at 5:57 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

In [1], we decided to document pg_get_multixact_member() in section
"Transaction ID and Snapshot Information Functions". I think the
discussion in the email thread applies to this function as well.

Done -- the function is now documented under “Transaction ID and
Snapshot Information Functions” for consistency.

The description here doesn't follow the format of the other functions
in this section.

Updated the description in func.sgml to match the style of other
functions. Extended usage guidance is now in maintenance.sgml.

Throwing an error causes the surrounding transaction to abort, so it
should be avoided in a monitoring/reporting function if possible.

The function now returns NULL instead of throwing an error when counts
can’t be read.

If ReadMultiXactCounts() returns false, MultiXactMemberFreezeThreshold() returns 0...

Noted -- the docs now mention that the function can be used to
anticipate more aggressive autovacuum behavior in such cases.

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).

Since the function already provides the member count, including the
physical size in its output would duplicate information and add no
extra benefit.

The patch needs tests.

Added an isolation test to cover initial state, MultiXact creation,
counts, and oldest MultiXact reporting.

On Mon, Jul 28, 2025 at 1:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Let's say if the user knows that the counts are so high that a
wraparound is imminent, but vacuuming isn't solving the problem...
Here's a quick patch implementing the same. Please feel free to
incorporate and refine it in your patch if you like it.

Thank you for sharing the patch. I have incorporated it into this
version with minor adjustments, and it fits well with the overall
design of the function.

On Mon, Jul 28, 2025 at 4:22 AM Michael Paquier <michael@paquier.xyz> wrote:

Yep, let's be consistent.

Done -- placed in “Transaction ID and Snapshot Information Functions”
for consistency.

Most likely returning NULL is the best thing we can do, as a safe fallback.

Implemented -- the function now returns NULL if counts can’t be read.

The patch needs tests.

Isolation tests have been added as described above.

May I also suggest a split of the multixact SQL functions into a
separate file, a src/backend/utils/adt/multixactfuncs.c?

I agree that would be cleaner, but I’d prefer to keep the
implementation in multixact.c for now to maintain focus on this patch
and revisit the refactoring later.

+PG_FUNCTION_INFO_V1(pg_get_multixact_count);

Removed -- now handled entirely by pg_proc.dat.

...You may want to add an example of how one can use it for monitoring in the docs.

I’ve added a usage example with sample output in the docs. If you had
a different kind of demo in mind (e.g., creating multixacts manually
and showing the output), please let me know.

References:
[0] https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L130-L156
[1] /messages/by-id/CACbFw60UOk6fCC02KsyT3OfU9Dnuq5roYxdw2aFisiN_p1L0bg@mail.gmail.com

Best regards,
Naga Appani

Following up on my v2 from yesterday — the recent commit [0]https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b changed
the directory layout, which broke the patch (v2). This v3 updates the
code to work with the new structure and also fixes some formatting
issues I noticed while revisiting the changes.

The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).

Please find v3 attached.

References:
[0]: https://github.com/postgres/postgres/commit/4e23c9ef65accde7eb3e56aa28d50ae5cf79b64b

Best regards,
Naga Appani

Attachments:

v3-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchapplication/octet-stream; name=v3-0001-Add-pg_get_multixact_stats-SQL-function-for-monit.patchDownload+219-16
#10Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#9)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Aug 04, 2025 at 04:51:30PM -0500, Naga Appani wrote:

The rest of the patch remains the same as v2, which incorporated
feedback from Ashutosh and Michael (see my previous email for
details).

Please find v3 attached.

I am reading again what you have here, and I really think that we
should move the SQL function parts of multixact.c into their own new
file, exposing ReadMultiXactCounts() in multixact.h, because I also
suspect that this can become really useful for extensions that aim at
doing things similar to your proposal in terms of data monitoring for
autovacuum wraparound. This means two refactoring patches:
- One to expose the new routine in multixact.h.
- One to move the existing SQL code to its new file.

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition. So perhaps this should
be named GetMultiXactInformation() or something similar?

The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call. It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false. What about oldestOffset itself? Should it be returned for
consistency with the counts and oldestMultiXactId?
--
Michael

#11Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#8)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote:

In PG14+, the transaction wraparound is triggered if the size of the
directory exceeds 10GB. This function does not help monitoring that
condition. So a user will need to use du or pg_ls_multixactdir()
anyway, which defeats the purpose of this function being more
efficient than those methods. Am I correct? Can we also report the
size of the directory in this function?

Correct, that is the intent of the function. The members count
returned by this function already provides the necessary information
to determine the directory size, since each member entry has a fixed
size. The constants and formulas in [0] and discussed in [1] show that
each group stores four bytes of flags plus four TransactionIds (20
bytes total), yielding 409 groups per 8 KB page and a fixed
members‑to‑bytes ratio. This means ~2 billion members corresponds to
~10 GB (aggressive autovacuum threshold) and ~4 billion members
corresponds to ~20 GB (wraparound).

Would it be better to do that math in the function and output the
result? Users may not be able to read and understand the PostgreSQL
code or pgsql-hackers threads Or the constants may change across
versions. It will be more convenient for users if they get the output
from the function itself.

On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller,
where the key point is to make sure that the information retrieved is
consistent across a single LWLock acquisition. So perhaps this should
be named GetMultiXactInformation() or something similar?

+1

The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call. It looks
inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
false. What about oldestOffset itself? Should it be returned for
consistency with the counts and oldestMultiXactId?

+1

Some more comments on the patch
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID
still in use.

Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>

This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.

I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum

In case each member starts consuming more or less space than it does
today what would be the basis of triggering workaround? Space or
number of members? I think we should mention only that.

scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.

Similar to above.

+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing

This is the right place to elaborate the usage of this function with an example.

+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.

I think the second point here repeats what's already mentioned
earlier. It will be good to elaborate each point with an example
instead of just narration.

+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,

... since startup or the number of existing members?

+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.

The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.

+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)

Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.

--
Best Wishes,
Ashutosh Bapat

#12Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#11)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Hi Michael, Ashutosh,

Thanks a lot for the detailed reviews and feedback. Please find
attached v4 of the patchset.

Summary of changes in v4:
- Split into two patches as suggested:
1. Expose and rename ReadMultiXactCounts() -> GetMultiXactInfo() in
multixact.h with clearer comments.
2. Add pg_get_multixact_stats() as a SQL-callable function in a new
file (multixactfuncs.c), with docs and tests.
- Function now also returns oldestOffset for consistency.
- Field names updated to num_mxids, num_members, oldest_multixact,
oldest_offset.
- Documentation revised to describe thresholds only in terms of member
counts (disk size wording removed).
- Added a minimal example in maintenance.sgml where multixact
wraparound is already described.
- Isolation tests are rewritten so they no longer depend on exact
counts, but only on monotonic properties guaranteed while a multixact
is pinned.

Replies inline below:

On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:

I really think that we should move the SQL function parts of multixact.c
into their own new file, exposing ReadMultiXactCounts() in multixact.h...

Done. The SQL-callable code now lives in
src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
src/include/access/multixact.h.

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller...
So perhaps this should be named GetMultiXactInformation() or something
similar?

Renamed to GetMultiXactInfo().

The top of ReadMultiXactCounts() (or whatever its new name) should
also document the information returned across a single call.

Added detailed comments about consistency under a single LWLock and the
meaning of each field.

It looks inconsistent to return oldestMultiXactId if the
oldestOffsetKnown is false. What about oldestOffset itself?

GetMultiXactInfo() now returns oldestOffset as well. If the oldest
offset isn’t currently known, the function returns false and clears
all outputs, so callers don’t see a partially valid struct.

---

On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Would it be better to do that math in the function and output the result?

That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.

I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.

Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.

Adjusted. The SQL function returns: num_mxids, num_members,
oldest_multixact, oldest_offset.

This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.

Earlier I thought it was fine to add an example since
pg_input_error_info() also has one, so in this version I placed the
example in maintenance.sgml, where we already discuss multixact
wraparound. That seemed like the most natural place for it. I agree with
your point about consistency, though, so I kept the style minimal and
aligned with the surrounding text.

I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.

Added a note that values are a live snapshot and can change immediately.

In case each member starts consuming more or less space than it does
today what would be the basis of triggering wraparound? Space or
number of members? I think we should mention only that.

I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.

This is the right place to elaborate the usage of this function with an
example.

Expanded with a short example, while keeping it consistent with nearby
entries.

... since startup or the number of existing members?

Clarified that the values reflect what’s *currently in use* (i.e.,
derived from next/oldest) and that NULLs are returned if the multixact
subsystem has not been initialized yet.

The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.

Dropped those lines.

Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.

Right, the earlier version of the test assumed stable counts, which
could fail if autovacuum or background cleanup removed entries in
between steps. In v4 the isolation test no longer relies on exact
numbers. Instead it asserts only the monotonic properties that are
guaranteed while a multixact is pinned, and avoids assumptions once
locks are released. That makes the test robust against concurrent vacuum
activity.

---

Thanks again for the thoughtful reviews and guidance. Please let me know
if you see further adjustments needed.

Best regards,
Naga

Attachments:

v4-0001-Rename-ReadMultiXactCounts-to-GetMultiXactInfo-an.patchapplication/octet-stream; name=v4-0001-Rename-ReadMultiXactCounts-to-GetMultiXactInfo-an.patchDownload+32-19
v4-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v4-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload+379-15
#13Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#12)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Sun, Aug 17, 2025 at 01:27:29AM -0500, Naga Appani wrote:

On Thu, Aug 7, 2025 at 7:35 PM Michael Paquier <michael@paquier.xyz> wrote:

I really think that we should move the SQL function parts of multixact.c
into their own new file, exposing ReadMultiXactCounts() in multixact.h...

Done. The SQL-callable code now lives in
src/backend/utils/adt/multixactfuncs.c
and the accessor is declared in
src/include/access/multixact.h.

My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h. I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.

ReadMultiXactCounts() is also incorrectly named with your proposal to
expose oldestMultiXactId in the information returned to the caller...
So perhaps this should be named GetMultiXactInformation() or something
similar?

Renamed to GetMultiXactInfo().

+ * Returns information about current MultiXact state in a single atomic read:

This comment is incorrect. This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.

Except for this comment, this looks pretty much OK. Ashutosh, any
comments?

I have not looked at the rest.
--
Michael

#14Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#12)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Sun, Aug 17, 2025 at 11:57 AM Naga Appani <nagnrik@gmail.com> wrote:

On Fri, Aug 8, 2025 at 4:33 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

In case each member starts consuming more or less space than it does
today what would be the basis of triggering wraparound? Space or
number of members? I think we should mention only that.

I updated the docs to describe wraparound in terms of member counts only.
The earlier mention of disk size has been dropped, since the thresholds
are defined by counts.

The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.

That’s a cool idea, thanks for pointing it out. For now I have kept the
SQL function focused only on exposing the raw counts (num_mxids,
num_members, oldest IDs). My thought was that keeping the API lean makes
it easier to maintain across versions, while leaving any derived
calculations like approximate storage size to SQL or external tooling.
This way the function remains simple and future-proof, while still
giving users the building blocks to get the view they need.

I’m happy to revisit this if others feel it would be better for the
function to provide an approximate size directly — I wanted to start
with the simplest surface and gather feedback first.

The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.

If size is being used as a threshold, reporting count is useless
because user wouldn't know the relation easily. If count is used as a
threshold, reporting count makes sense.

--
Best Wishes,
Ashutosh Bapat

#15Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#14)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Hi Michael, Ashutosh,

Thanks a lot for taking the time to review this patch and share your thoughts.

Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.

Please see my in-line replies below.

---

On Mon, Aug 18, 2025 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:

My point was a bit different: multixactfuncs.c should be created first
because we already have one SQL function in multixact.c that can be
moved inside it, with the declarations it requires added to
multixact.h. I've extracted what you did, moved the existing
pg_get_multixact_members() inside the new file, and applied the
result.

Really appreciate your clarification and for making that change. I
misunderstood your earlier point.

+ * Returns information about current MultiXact state in a single atomic read:

This comment is incorrect. This is not an atomic read, grabbing a
consistent state of the data across one single lock acquisition.

Fixed and adjusted wording.

---

On Mon, Aug 18, 2025 at 6:56 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The current document says
"Also, if the storage occupied by multixacts members exceeds about
10GB, aggressive vacuum scans will occur more often for all tables,
starting with those that have the oldest multixact-age." - do you mean
that it's wrong. Instead of checking 10GB threashold, is the code
checking the equivalent member count? If so, I think we need a
separate patch to correct the documentation first. Can you please
point me to the code? Documentation should reflect the code.

The decision is made in MultiXactMemberFreezeThreshold() [0]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916, and it
is entirely count-based:

if (members <= MULTIXACT_MEMBER_SAFE_THRESHOLD)
return autovacuum_multixact_freeze_max_age;

fraction = (double) (members - MULTIXACT_MEMBER_SAFE_THRESHOLD) /
(MULTIXACT_MEMBER_DANGER_THRESHOLD - MULTIXACT_MEMBER_SAFE_THRESHOLD);

MaxMultiXactOffset is defined in multixact.h [1]https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31:

#define MaxMultiXactOffset ((MultiXactOffset) 0xFFFFFFFF)

Thresholds are defined in multixact.c [2]https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218
#define MULTIXACT_MEMBER_SAFE_THRESHOLD (MaxMultiXactOffset / 2)
#define MULTIXACT_MEMBER_DANGER_THRESHOLD \
(MaxMultiXactOffset - MaxMultiXactOffset / 4)

These translate to:
- MaxMultiXactOffset: ~4.29 billion (2^32 - 1)
- MULTIXACT_MEMBER_SAFE_THRESHOLD: ~2.14 billion (2^31 - 1)
- MULTIXACT_MEMBER_DANGER_THRESHOLD: ~3.22 billion (3/4 * 2^32)

So the code path is count-driven.

Regarding docs:
For earlier versions (18 and before), the storage-size approximation
remains relevant because users don’t have direct access to member
count information. Since we don’t plan to backpatch (I assume so) this
new function, the documentation for older branches should continue to
rely on the storage-based approximation.

Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
docs can describe the thresholds in terms of counts directly. For
older branches, the storage approximation still provides users with a
practical way to reason about wraparound risk.

The constant multiplier which converts a count into the disk size is
in the server code. Duplicating it outside the server code, even in
documentation, would require maintenance. GetMultiXactInfo() may not
do the arithmetic but pg_get_multixact_stats() is lean enough to add a
couple computations.

Thanks for suggesting this — it makes sense, especially for users
upgrading from earlier versions to 19 and higher. I’ve added a
members_bytes column to pg_get_multixact_stats(), computed as
num_members * 5. This respects the existing server-side logic while
also giving those users a familiar reference point, helping them
connect the older size-based guidance with the new count-based view.

---

References:
[0]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L2916
[1]: https://github.com/postgres/postgres/blob/master/src/include/access/multixact.h#L31
[2]: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/multixact.c#L216-L218

Patch v5 is attached. Thanks again for the thoughtful reviews — I really
appreciate the guidance and look forward to further feedback.

Best regards,
Naga

Attachments:

v5-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v5-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload+399-29
#16Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#15)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Mon, Aug 18, 2025 at 08:32:39PM -0500, Naga Appani wrote:

Thanks a lot for taking the time to review this patch and share your thoughts.

Here’s a short summary of what has changed in v5:
- Added the new pg_get_multixact_stats() function in multixactfuncs.c.
- Fixed the misleading “atomic read” comment in the accessor.
- Clarified documentation: thresholds are described in terms of
counts, since that’s what the code uses.
- Added a members_bytes column in pg_get_multixact_stats() to give
users a rough size estimate (num_members * 5), while making it clear
this is layout-dependent.

Please see my in-line replies below.

FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--
Michael

#17Naga Appani
nagnrik@gmail.com
In reply to: Michael Paquier (#16)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote:

FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--

Thanks for the review and separating the refactoring into its own commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.

Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.

Changes since v5:
- Removed the refactoring, as GetMultiXactInfo() is already committed.
- Documentation revised to describe thresholds in terms of raw counts.

Hopefully this makes the proposal easier to evaluate on its own merits.

Attachments:

v6-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchapplication/octet-stream; name=v6-0001-Add-pg_get_multixact_stats-function-for-monitorin.patchDownload+369-15
#18torikoshia
torikoshia@oss.nttdata.com
In reply to: Naga Appani (#17)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On 2025-08-20 13:27, Naga Appani wrote:

Thanks for working on this!

On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
wrote:

FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful, but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--

Thanks for the review and separating the refactoring into its own
commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.

Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.

diff --git a/doc/src/sgml/maintenance.sgml 
b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that 
database.
      <para>
       As a safety device, an aggressive vacuum scan will
       occur for any table whose multixact-age is greater than <xref
-     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if the
-     storage occupied by multixacts members exceeds about 10GB, 
aggressive vacuum
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the 
number
+     of members created exceeds approximately 2^31 entries, aggressive 
vacuum
       scans will occur more often for all tables, starting with those 
that

Looking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:

The threshold is two billion members, which was interpreted as 2GB
in the documentation. Fix to reflect that each member takes up five
bytes, which translates to about 10GB. This is not exact, because of
page boundaries. While at it, mention the maximum size 20GB.

Anyway, I also think, as Ashutosh suggested, that if we want to fix this
documentation, it would be better to do so in a separate patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

#19torikoshia
torikoshia@oss.nttdata.com
In reply to: torikoshia (#18)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On 2025-08-22 09:28, torikoshia wrote:

On 2025-08-20 13:27, Naga Appani wrote:

Thanks for working on this!

On Tue, Aug 19, 2025 at 1:32 AM Michael Paquier <michael@paquier.xyz>
wrote:

FWIW, I think that you should be a bit more careful before sending
updated patch sets. You have missed an extra point I have raised
upthread about the refactoring pieces: the switch from
ReadMultiXactCounts() to GetMultiXactInfo() can be done in a patch of
its own.

So I have extracted this part from your latest patch, and applied it
independently of the SQL function business. Now we are in an
advantageous position on HEAD: even if we do not conclude about the
SQL function to show the mxact numbers and offsets, we have the
function that gives an access to the data you are looking for. In
short, it is now possible to provide an equivalent of the feature you
want outside of core. Not saying that the patch cannot be useful,
but
such refactoring pieces open more possibilities, and offer a cleaner
commit history with less churn in the main patches.
--

Thanks for the review and separating the refactoring into its own
commit.
Point taken on being more careful when sending updated patch sets.
I’ll make sure to keep
refactoring and SQL layer changes clearly separated going forward.

Attached is v6, rebased on top of HEAD. This version is limited to the
SQL function only.

diff --git a/doc/src/sgml/maintenance.sgml 
b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..6f0e8d7c10a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,12 +813,56 @@ HINT:  Execute a database-wide VACUUM in that 
database.
<para>
As a safety device, an aggressive vacuum scan will
occur for any table whose multixact-age is greater than <xref
-     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if 
the
-     storage occupied by multixacts members exceeds about 10GB,
aggressive vacuum
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the 
number
+     of members created exceeds approximately 2^31 entries, aggressive 
vacuum
scans will occur more often for all tables, starting with those 
that

Looking at commit ff20ccae9fdb, it seems that the documentation was
intentionally written in terms of gigabytes rather than the number:

The threshold is two billion members, which was interpreted as 2GB
in the documentation. Fix to reflect that each member takes up five
bytes, which translates to about 10GB. This is not exact, because of
page boundaries. While at it, mention the maximum size 20GB.

Anyway, I also think, as Ashutosh suggested, that if we want to fix
this documentation, it would be better to do so in a separate patch.

Ah, I've found why you choose to add this doc modification in this patch
in the thread, sorry for skipping over the part:
| For earlier versions (18 and before), the storage-size approximation
| remains relevant because users don’t have direct access to member
| count information. Since we don’t plan to backpatch (I assume so) this
| new function, the documentation for older branches should continue to
| rely on the storage-based approximation.

| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.

Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

#20Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: torikoshia (#19)
Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia@oss.nttdata.com> wrote:

| Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
| docs can describe the thresholds in terms of counts directly.

Personally, I think it might be fine to keep the gigabyte-based
description, and perhaps we could show both the number of members and
gigabytes, since it'd be also helpful to have a sense of the scale.

Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.

Here's detailed review of the patch
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member
entries created,
+ <literal>members_bytes</literal> is the storage occupied by
<literal>num_members</literal>

I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.

+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.

I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.

+ <para>
+ This is a live snapshot of shared counters; the numbers can change
between calls,
+ even within the same transaction.
+ </para></entry>

I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum

a member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?

+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in

unnecessary pair of commas.

+ This output shows a system with significant multixact activity:
about ~100 million
+ multixact IDs and ~773 million member entries have been created
since the oldest
+ surviving multixact (ID 39974368). By leveraging this information,
the function helps:
+ <orderedlist>
+ <listitem>
... snip ...
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>

I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.

But others may have different opinions.

Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.

+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use

We don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.

+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.

tuple or record instead of row.

In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().

I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/

And also
/* Have we completed multixact startup? */
bool finishedStartup;

I think we need to define this condition more accurately.

And include it in the documentation as well.

+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;

Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.

+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;

memset(nulls, false, sizeof(nulls)); is better and used everywhere.

In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.

+++ b/src/test/isolation/specs/multixact_stats.spec

I have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.

@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While
it is pinned
+# by two open transactions, we assert only invariants that background
VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation
may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.

What's a driver transaction?

+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns,
which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).

Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?

+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}

You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?

+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
... snip ...
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]

This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.

--
Best Wishes,
Ashutosh Bapat

#21Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#20)
#22Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#21)
#23Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#22)
#24Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#23)
#25Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#24)
#26torikoshia
torikoshia@oss.nttdata.com
In reply to: Naga Appani (#25)
#27Xuneng Zhou
xunengzhou@gmail.com
In reply to: torikoshia (#26)
#28Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Xuneng Zhou (#27)
#29Naga Appani
nagnrik@gmail.com
In reply to: Tomas Vondra (#28)
#30Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#29)
#31Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#30)
#32Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Naga Appani (#31)
#33Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#33)
#35Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#34)
#36Naga Appani
nagnrik@gmail.com
In reply to: Ashutosh Bapat (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#37)
#39Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#38)
#40Naga Appani
nagnrik@gmail.com
In reply to: Michael Paquier (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Naga Appani (#40)
#42Shinoda, Noriyoshi (PN Japan FSIP)
noriyoshi.shinoda@hpe.com
In reply to: Michael Paquier (#41)