mxid_score can become Infinity in pg_stat_autovacuum_scores

Started by Masahiko Sawada11 days ago10 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

While testing the autovacuum score, I noticed that scores->mxid could
be infinity by the following calculation in
relation_needs_vacanalyze():

scores->mxid = (double) mxid_age / multixact_freeze_max_age;

The variable multixact_freeze_max_age originates from
effective_multixact_freeze_max_age, which is determined by
MultiXactMemberFreezeThreshold(). As noted in the comments for
MultiXactMemberFreezeThreshold(), it can return 0 under certain
conditions:

/* fraction could be > 1.0, but lowest possible freeze age is zero */
if (fraction >= 1.0)
return 0;

Since mxid_age is cast to a double before the division, this does not
trigger a division-by-zero error or cause a server crash. However,
scores->mxid results in 'inf', which displays as "Infinity" in the
mxid_score column of the pg_stat_autovacuum_scores view. While this
might not be intentional, it seems better to prevent mxid_score from
becoming infinity by doing something like this:

- scores->mxid = (double) mxid_age / multixact_freeze_max_age;
+ scores->mxid = (double) mxid_age / Max(multixact_freeze_max_age, 1);

Any thoughts on this?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#2Sami Imseih
samimseih@gmail.com
In reply to: Masahiko Sawada (#1)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores
- scores->mxid = (double) mxid_age / multixact_freeze_max_age;
+ scores->mxid = (double) mxid_age / Max(multixact_freeze_max_age, 1);

Any thoughts on this?

Hi,

That is a good finding. I think what you are suggesting makes sense.

If multixact_freeze_max_age is 0 (we have more than
MULTIXACT_MEMBER_HIGH_THRESHOLD members, 4 billion)
we then prioritize based on mxid_age, which will be high at that
point for most cases and put that table high on the priority list.

I do think we need to mention in the docs also about this caveat
in scoring, so users of pg_stat_autovacuum_scores are not surprised.
As member space usage grows between 2 billion and 4 billion, the
score ramps up gradually, but once members reach 4 billion the effective freeze
max age drops to 0 and the score jumps to mxid_age itself,
which could be in the hundreds of millions.

See attached.

--
Sami Imseih
Amazon Web Services (AWS)

Attachments:

v1-0001-Correct-the-MultiXact-autovacuum-priority-score-w.patchapplication/octet-stream; name=v1-0001-Correct-the-MultiXact-autovacuum-priority-score-w.patchDownload+13-7
#3Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Sami Imseih (#2)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

Hi,

On Fri, Jun 12, 2026 at 11:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While testing the autovacuum score, I noticed that scores->mxid could
be infinity by the following calculation in
relation_needs_vacanalyze():

scores->mxid = (double) mxid_age / multixact_freeze_max_age;

Nice catch!

On Fri, Jun 12, 2026 at 1:38 PM Sami Imseih <samimseih@gmail.com> wrote:

If multixact_freeze_max_age is 0 (we have more than
MULTIXACT_MEMBER_HIGH_THRESHOLD members, 4 billion)
we then prioritize based on mxid_age, which will be high at that
point for most cases and put that table high on the priority list.

Commit bd8d9c9bdf eliminated MultiXactOffset wraparound and the 2^32
limit on the total number of multixact members (i.e., the number of
txn-ids that are part of all multixacts at any given moment). However,
to limit disk space usage, it retained the aggressive multixact
freezing logic (with a note to make it configurable in future). This
means that when the total multixact members exceed 4 billion, we can
hit a condition where the computed fraction is >= 1.0 and the returned
freeze threshold is 0, telling the caller that freezing is urgent on
this table.

When this happens, we want the table to be vacuumed regardless of
other scores. However, with just setting scores->mxid = mxid_age (as
in the attached patch), unless I'm missing something, there seems to
be a risk that the table won't get to the top of the priority list
because scores->max gets recalculated even after mxid score is
accounted with max of (xid, mxid). Could you help me understand how
this case is handled?

I do think we need to mention in the docs also about this caveat
in scoring, so users of pg_stat_autovacuum_scores are not surprised.
As member space usage grows between 2 billion and 4 billion, the
score ramps up gradually, but once members reach 4 billion the effective freeze
max age drops to 0 and the score jumps to mxid_age itself,
which could be in the hundreds of millions.

I didn't find commit bd8d9c9bdf adding any documentation. Maybe it's
worth adding some notes on what it means for the customers having
multixact-heavy workloads - especially it eliminates anti-wraparound
freezing because of running out of members space.

See attached.

Thanks for the patch. Some comments:

1/
+       <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>.  However,
+       when multixact member space usage is high (see
+       <xref linkend="vacuum-for-multixact-wraparound"/>), the effective
+       freeze max age is reduced below
+       <xref linkend="guc-autovacuum-multixact-freeze-max-age"/> to help
+       reclaim multixact member disk space, which can result in much higher
+       scores than normal.  Furthermore, this component increases greatly
+       once the age surpasses
+       <xref linkend="guc-vacuum-multixact-failsafe-age"/>.  The
+       final value for this component can be adjusted via

Isn't the "effective freeze max age" code-level terminology? IMHO,
adding a separate section for the commit bd8d9c9bdf makes it more
useful.

2/
  /*
  * To calculate the (M)XID age portion of the score, divide the age by its
- * respective *_freeze_max_age parameter.
+ * respective *_freeze_max_age parameter. MultiXactMemberFreezeThreshold()
+ * can return 0, in which case we effectively use mxid_age as the score.
  */
  xid_age = TransactionIdIsNormal(relfrozenxid) ? recentXid - relfrozenxid : 0;
  mxid_age = MultiXactIdIsValid(relminmxid) ? recentMulti - relminmxid : 0;

For better readability, can we enhance this comment by saying exactly
when the freeze threshold gets returned as 0 telling the caller that
freezing is urgent on this table?

3/ I checked around to see if we have tests for the case where we hit
this case where fraction is >= 1.0 i.e. multixact members are >
4billion and the closest I found is this 002_multixact_wraparound.pl,
but I don't think it covers this case. Its worth testing this case and
the fix locally. FWIW, this code doesn't have coverage -
https://coverage.postgresql.org/src/backend/access/transam/multixact.c.gcov.html.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

#4Sami Imseih
samimseih@gmail.com
In reply to: Bharath Rupireddy (#3)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

Thanks for the comments!

When this happens, we want the table to be vacuumed regardless of
other scores.

I am not sure this is correct. For example, I don't think this scenario should
be prioritized higher than a table that is in failsafe because it's nearing
wraparound. what do you think?

However, with just setting scores->mxid = mxid_age (as
in the attached patch), unless I'm missing something, there seems to
be a risk that the table won't get to the top of the priority list
because scores->max gets recalculated even after mxid score is
accounted with max of (xid, mxid). Could you help me understand how
this case is handled?

It will be prioritized based on the mxid_age, which should naturally
be high at that
point, so the priority of this table will be based on its age.

I do think we need to mention in the docs also about this caveat
in scoring, so users of pg_stat_autovacuum_scores are not surprised.
As member space usage grows between 2 billion and 4 billion, the
score ramps up gradually, but once members reach 4 billion the effective freeze
max age drops to 0 and the score jumps to mxid_age itself,
which could be in the hundreds of millions.

I didn't find commit bd8d9c9bdf adding any documentation. Maybe it's
worth adding some notes on what it means for the customers having
multixact-heavy workloads - especially it eliminates anti-wraparound
freezing because of running out of members space.

Perhaps more docs on the improvement should be added, but that
seems orthogonal to the issue being discussed here.

1/
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>. However,
+ when multixact member space usage is high (see
+ <xref linkend="vacuum-for-multixact-wraparound"/>), the effective
+ freeze max age is reduced below
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/> to help
+ reclaim multixact member disk space, which can result in much higher
+ scores than normal. Furthermore, this component increases greatly
+ once the age surpasses
+ <xref linkend="guc-vacuum-multixact-failsafe-age"/>. The
+ final value for this component can be adjusted via

Isn't the "effective freeze max age" code-level terminology?

yes, it is described as "effective" in code, but I also think it makes sense
user-facing. It does get the point across, doesn't it?

2/
/*
* To calculate the (M)XID age portion of the score, divide the age by its
- * respective *_freeze_max_age parameter.
+ * respective *_freeze_max_age parameter. MultiXactMemberFreezeThreshold()
+ * can return 0, in which case we effectively use mxid_age as the score.
*/
xid_age = TransactionIdIsNormal(relfrozenxid) ? recentXid - relfrozenxid : 0;
mxid_age = MultiXactIdIsValid(relminmxid) ? recentMulti - relminmxid : 0;

For better readability, can we enhance this comment by saying exactly
when the freeze threshold gets returned as 0 telling the caller that
freezing is urgent on this table?

That is already described in MultiXactMemberFreezeThreshold(), right?

3/ I checked around to see if we have tests for the case where we hit
this case where fraction is >= 1.0 i.e. multixact members are >
4billion and the closest I found is this 002_multixact_wraparound.pl,
but I don't think it covers this case. Its worth testing this case and
the fix locally. FWIW, this code doesn't have coverage -
https://coverage.postgresql.org/src/backend/access/transam/multixact.c.gcov.html.

This looks like a separate discussion as well, but I am not against testing
for this.

--
Sami Imseih
Amazon Web Services (AWS)

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#4)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

On Mon, Jun 15, 2026 at 03:26:59PM -0500, Sami Imseih wrote:

I do think we need to mention in the docs also about this caveat
in scoring, so users of pg_stat_autovacuum_scores are not surprised.
As member space usage grows between 2 billion and 4 billion, the
score ramps up gradually, but once members reach 4 billion the effective freeze
max age drops to 0 and the score jumps to mxid_age itself,
which could be in the hundreds of millions.

I'm -0.2 for documenting this case. I understand that users might be
confused about the results in such extreme situations, but I worry more
about users being confused by the excruciating detail of the documentation.
The existing docs are already quite complex, but I did spent a lot of time
trying to find the right balance of detail and accessibility when
committing. There are certainly other rare corner cases in which the
existing docs aren't telling the full story, and I don't think adding more
prose is really going to help.

--
nathan

#6Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#5)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

I do think we need to mention in the docs also about this caveat
in scoring, so users of pg_stat_autovacuum_scores are not surprised.
As member space usage grows between 2 billion and 4 billion, the
score ramps up gradually, but once members reach 4 billion the effective freeze
max age drops to 0 and the score jumps to mxid_age itself,
which could be in the hundreds of millions.

I'm -0.2 for documenting this case. I understand that users might be
confused about the results in such extreme situations, but I worry more
about users being confused by the excruciating detail of the documentation.
The existing docs are already quite complex, but I did spent a lot of time
trying to find the right balance of detail and accessibility when
committing.

I think this particular scenario is very clear to explain just like
how we explain
the failsafe scenario. Also, the suggested docs in the view link to the already
existing detailed explanation of this behavior.

More generally, I think anytime there is a drastic change in a score, like
jumping from a gradually ramping value around 1.x to suddenly hundreds of
millions, that's something worth calling out in the docs. Users monitoring
pg_stat_autovacuum_scores will notice that jump and want to understand why
it happened.

--
Sami Imseih
Amazon Web Services (AWS)

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#6)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

On Tue, Jun 16, 2026 at 11:21:57AM -0500, Sami Imseih wrote:

I think this particular scenario is very clear to explain just like how
we explain the failsafe scenario. Also, the suggested docs in the view
link to the already existing detailed explanation of this behavior.

More generally, I think anytime there is a drastic change in a score,
like jumping from a gradually ramping value around 1.x to suddenly
hundreds of millions, that's something worth calling out in the docs.
Users monitoring pg_stat_autovacuum_scores will notice that jump and want
to understand why it happened.

Okay. I fiddled with the patch a bit and came up with the attached. WDYT?

--
nathan

Attachments:

v2-0001-fix-division-by-zero-when-calculating-autovacuum-.patchtext/plain; charset=us-asciiDownload+12-6
#8Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#7)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

Okay. I fiddled with the patch a bit and came up with the attached. WDYT?

This LGMT.

--
Sami

#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#7)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

Hi,

On Wed, Jun 17, 2026 at 9:18 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Jun 16, 2026 at 11:21:57AM -0500, Sami Imseih wrote:

I think this particular scenario is very clear to explain just like how
we explain the failsafe scenario. Also, the suggested docs in the view
link to the already existing detailed explanation of this behavior.

More generally, I think anytime there is a drastic change in a score,
like jumping from a gradually ramping value around 1.x to suddenly
hundreds of millions, that's something worth calling out in the docs.
Users monitoring pg_stat_autovacuum_scores will notice that jump and want
to understand why it happened.

Okay. I fiddled with the patch a bit and came up with the attached. WDYT?

+ of multixact member entries created exceeds approximately 2 billion

Although 2 billion is an internal threshold that may change to a
configurable parameter in the future depending on the workload and
disk space (per commit bd8d9c9bdf), I'm fine keeping it as is.

Attached v2 patch LGTM.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#9)
Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

Committed.

--
nathan