Correcting freeze conflict horizon calculation

Started by Melanie Plageman9 months ago22 messages

melanieplageman@gmail.com

9 months ago

Hi,

I noticed while writing another patch that I think there might be an
issue/oversight with how the snapshot conflict horizon is calculated
for the prune/freeze WAL record in master/17 and the freeze WAL record
in 16.

In code to determine whether or not to freeze tuples on the page in
phase I of vacuum, we ignore LP_DEAD items temporarily, assuming they
will be set LP_UNUSED and removed by phase III of vacuum.

This means that the local values of all_visible and all_frozen
(whether or not the page is all-visible/all-frozen) in the PruneState
may temporarily be true when there are LP_DEAD items.
Before returning to code that actually updates the visibility map,
all_visible and all_frozen are unset if there are lpdead items.

However, we calculate the snapshot conflict horizon for the
prune/freeze record in master/17 and the freeze record in 16 before
updating all-frozen and all-visible. That means the horizon may be too
aggressive if the page cannot actually be set all-frozen. This isn't a
correctness issue, but it may lead to transactions on the standby
being unnecessarily canceled for pages which have some tuples frozen
but not all due to remaining LP_DEAD items.

I'm not 100% certain of my analysis given that this is a complicated
area of the code. I've attached suggested fixes for master/17 and 16
and am interested in what others think.

- Melanie

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#1)

Re: Correcting freeze conflict horizon calculation

On Thu, May 29, 2025 at 10:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

However, we calculate the snapshot conflict horizon for the
prune/freeze record in master/17 and the freeze record in 16 before
updating all-frozen and all-visible. That means the horizon may be too
aggressive if the page cannot actually be set all-frozen. This isn't a
correctness issue, but it may lead to transactions on the standby
being unnecessarily canceled for pages which have some tuples frozen
but not all due to remaining LP_DEAD items.

I don't follow.

Your concern is that the horizon might be overly aggressive/too
conservative. But your patch (for 16) makes us take the
don't-use-snapshotConflictHorizon-twice block *less* frequently (and
the "use OldestXmin conservatively" block *more* frequently):

- if (prunestate->all_visible && prunestate->all_frozen)
+ if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
{
    /* Using same cutoff when setting VM is now unnecessary */
    snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
    prunestate->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
    /* Avoids false conflicts when hot_standby_feedback in use */
    snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
    TransactionIdRetreat(snapshotConflictHorizon);
}

How can taking the "Avoids false conflicts when hot_standby_feedback
in use" path more often result in fewer unnecessary conflicts on
standbys? Isn't it the other way around?

--
Peter Geoghegan

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#2)

Re: Correcting freeze conflict horizon calculation

On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:

Your concern is that the horizon might be overly aggressive/too
conservative. But your patch (for 16) makes us take the
don't-use-snapshotConflictHorizon-twice block *less* frequently (and
the "use OldestXmin conservatively" block *more* frequently):
- if (prunestate->all_visible && prunestate->all_frozen)
+ if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
prunestate->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
TransactionIdRetreat(snapshotConflictHorizon);
}
How can taking the "Avoids false conflicts when hot_standby_feedback
in use" path more often result in fewer unnecessary conflicts on
standbys? Isn't it the other way around?

You're right. I forgot that the visibility_cutoff_xid will be older
than OldestXmin when all the tuples are going to be frozen. I
associate the visibility_cutoff_xid with being younger/newer than
OldestXmin because it often will be when there are newer live tuples
we don't freeze.

And, in the case where the page is not actually all-frozen because of
LP_DEAD items we haven't accounted for yet in the value of all_frozen,
they wouldn't affect the freeze record's snapshot conflict horizon in
16 because they won't be frozen and thus unaffected by the WAL record
and in the case of the prune/freeze WAL record in 17/master, I
calculate the newer of the latest_xid_removed and the snapshot
conflict horizon calculated for freezing, so it would also be fine.

- Melanie

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#3)

Re: Correcting freeze conflict horizon calculation

On Fri, May 30, 2025 at 5:16 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:

How can taking the "Avoids false conflicts when hot_standby_feedback
in use" path more often result in fewer unnecessary conflicts on
standbys? Isn't it the other way around?

You're right. I forgot that the visibility_cutoff_xid will be older
than OldestXmin when all the tuples are going to be frozen.

I added an assertion in a number of places that
"Assert(!TransactionIdIsValid(visibility_cutoff_xid))" when we go to
set a page all-frozen in the VM -- it's irrelvant, and recovery cannot
possibly need a conflict when it replays the record. This seemed
logical, since an all-frozen page must already be visible to every
possible MVCC snapshot (it might also have some small performance
benefit, though that's not really why I did it).

I associate the visibility_cutoff_xid with being younger/newer than
OldestXmin because it often will be when there are newer live tuples
we don't freeze.

I'm not sure what you mean by this. visibility_cutoff_xid can only be
set using tuples that HTSV says are HEAPTUPLE_LIVE according to
VACUUM's OldestXmin cutoff. Personally I find it natural to think of
visibility_cutoff_xid as meaningless unless we're actually able to
apply it in some way.

--
Peter Geoghegan

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#4)

Re: Correcting freeze conflict horizon calculation

On Fri, May 30, 2025 at 5:34 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, May 30, 2025 at 5:16 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

You're right. I forgot that the visibility_cutoff_xid will be older
than OldestXmin when all the tuples are going to be frozen.

I added an assertion in a number of places that
"Assert(!TransactionIdIsValid(visibility_cutoff_xid))" when we go to
set a page all-frozen in the VM -- it's irrelvant, and recovery cannot
possibly need a conflict when it replays the record. This seemed
logical, since an all-frozen page must already be visible to every
possible MVCC snapshot (it might also have some small performance
benefit, though that's not really why I did it).

My point was not about the visibility_cutoff_xid when setting the page
all-frozen in the VM, it was that when you are selecting the snapshot
conflict horizon for the freeze record, if all of the tuples on the
page will be frozen after doing this freezing, then the
visibility_cutoff_xid you calculated while in lazy_scan_prune() will
necessarily be older than OldestXmin -- you only freeze tuples older
than OldestXmin and the visibility_cutoff_xid is the youngest live
tuple's xmin on the page. But since all tuples are older than
OldestXmin, the youngest live tuple's xmin on the page will be older
than OldestXmin (if all of the tuples on the page will be frozen). I
wasn't talking about invalidating the visibility_cutoff_xid for the VM
record at all.

I associate the visibility_cutoff_xid with being younger/newer than
OldestXmin because it often will be when there are newer live tuples
we don't freeze.

I'm not sure what you mean by this. visibility_cutoff_xid can only be
set using tuples that HTSV says are HEAPTUPLE_LIVE according to
VACUUM's OldestXmin cutoff. Personally I find it natural to think of
visibility_cutoff_xid as meaningless unless we're actually able to
apply it in some way.

Wait, I don't see that.

HeapTupleSatisfiesVacuum() passes OldestXmin which is used to evaluate
if tuples can be seen as dead:

if (TransactionIdPrecedes(dead_after, OldestXmin))
res = HEAPTUPLE_DEAD;

Which doesn't do anything for live tuples.

And HeapTupleSatisfiesVacuumHorizon() returns HEAPTUPLE_LIVE for
tuples whose inserting transaction has committed.
First we check
if (!HeapTupleHeaderXminCommitted(tuple))
then all those cases have returns and then if it was committed we do this:
/*
* Okay, the inserter committed, so it was good at some point. Now what
* about the deleting transaction?
*/
if (tuple->t_infomask & HEAP_XMAX_INVALID)
return HEAPTUPLE_LIVE;

So wouldn't the visibility_cutoff_xid be the newest xmin of a
committed live tuple on the page?
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin,
prunestate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
prunestate->visibility_cutoff_xid = xmin;

I don't see how OldestXmin comes into play with the visibility_cutoff_xid.

So that was why I thought when there are live committed tuples on the
page that could be newer than OldestXmin, that visibility_cutoff_xid
would be younger than OldestXmin.

- Melanie

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#5)

Re: Correcting freeze conflict horizon calculation

On Fri, May 30, 2025 at 5:57 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I don't see how OldestXmin comes into play with the visibility_cutoff_xid.

Code in heap_page_is_all_visible() (and other place, I guess the other
one is in pruneheap.c now) have a separate OldestXmin test:

/*
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
all_visible = false;
*all_frozen = false;
break;
}

Once we "break" here, it doesn't matter what visibility_cutoff_xid has
been set to. It cannot be used for any purpose.

--
Peter Geoghegan

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#6)

Re: Correcting freeze conflict horizon calculation

On Fri, May 30, 2025 at 6:10 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, May 30, 2025 at 5:57 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I don't see how OldestXmin comes into play with the visibility_cutoff_xid.

Code in heap_page_is_all_visible() (and other place, I guess the other
one is in pruneheap.c now) have a separate OldestXmin test:

/*
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
all_visible = false;
*all_frozen = false;
break;
}

Once we "break" here, it doesn't matter what visibility_cutoff_xid has
been set to. It cannot be used for any purpose.

Ah, I see this is done before visibility_cutoff_xid is advanced, so
visibility_cutoff_xid won't end up ever being a value newer than
OldestXmin. So it's not really the newest committed xmin on the page,
it is the newest committed xmin on the page preceding OldestXmin. I
had always been thinking of it as the newest committed xmin on the
page.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#2)

Re: Correcting freeze conflict horizon calculation

On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan <pg@bowt.ie> wrote:

Your concern is that the horizon might be overly aggressive/too
conservative. But your patch (for 16) makes us take the
don't-use-snapshotConflictHorizon-twice block *less* frequently (and
the "use OldestXmin conservatively" block *more* frequently):
- if (prunestate->all_visible && prunestate->all_frozen)
+ if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
prunestate->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
TransactionIdRetreat(snapshotConflictHorizon);
}
How can taking the "Avoids false conflicts when hot_standby_feedback
in use" path more often result in fewer unnecessary conflicts on
standbys? Isn't it the other way around?

Having discussed this and updated my understanding, now I realize I
don't understand when it would be unsafe to use
prunestate->visibility_cutoff_xid as the snapshot conflict horizon for
the freeze record.

As you've explained, it will always be <= OldestXmin. And, if the
record only freezes tuples (meaning it makes no other changes to the
page) and all of those tuples' xmins were considered when calculating
prunestate->visibility_cutoff_xid, then, even if the page isn't
all-frozen, how could it be incorrect to use the
prunestate->visibility_cutoff_xid as the horizon? Why do we use
OldestXmin when the page wouldn't be all-frozen?

- Melanie

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#8)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

As you've explained, it will always be <= OldestXmin. And, if the
record only freezes tuples (meaning it makes no other changes to the
page) and all of those tuples' xmins were considered when calculating
prunestate->visibility_cutoff_xid, then, even if the page isn't
all-frozen, how could it be incorrect to use the
prunestate->visibility_cutoff_xid as the horizon?

Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
unless it has actually been maintained using all of the tuples on the
page. It is primarily intended for use by the VM record (though it
doesn't have to continue to work that way, of course).

Why do we use
OldestXmin when the page wouldn't be all-frozen?

It has always worked that way (except that it was FreezeLimit, not
OldestXmin, since we could only ever freeze precisely those tuples <
FreezeLimit before Postgres 16). Using
prunestate->visibility_cutoff_xid instead (when that was possible) was
a big improvement.

You're right that in principle we could safely use a conflict horizon
XID before OldestXmin in cases where we still give up and just use
OldestXmin (cases where we just use "OldestXmin - 1", I should say).
I'm not sure how much this matters in practice; I imagine that using
prunestate->visibility_cutoff_xid is very much the common case with
page-level freezing.

--
Peter Geoghegan

#10

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#9)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 3:05 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

As you've explained, it will always be <= OldestXmin. And, if the
record only freezes tuples (meaning it makes no other changes to the
page) and all of those tuples' xmins were considered when calculating
prunestate->visibility_cutoff_xid, then, even if the page isn't
all-frozen, how could it be incorrect to use the
prunestate->visibility_cutoff_xid as the horizon?

Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
unless it has actually been maintained using all of the tuples on the
page. It is primarily intended for use by the VM record (though it
doesn't have to continue to work that way, of course).

Right, if the page isn't all-visible, we don't continue to keep
visibility_cutoff_xid up to date. But I didn't understand why we
didn't just keep it up to date regardless of whether the page is
all-visible and use it in the freeze record. It doesn't seem like all
that much extra computation.

Why do we use
OldestXmin when the page wouldn't be all-frozen?

It has always worked that way (except that it was FreezeLimit, not
OldestXmin, since we could only ever freeze precisely those tuples <
FreezeLimit before Postgres 16). Using
prunestate->visibility_cutoff_xid instead (when that was possible) was
a big improvement.

You're right that in principle we could safely use a conflict horizon
XID before OldestXmin in cases where we still give up and just use
OldestXmin (cases where we just use "OldestXmin - 1", I should say).
I'm not sure how much this matters in practice; I imagine that using
prunestate->visibility_cutoff_xid is very much the common case with
page-level freezing.

Yes, I guess what I mean is that, like pruning keeps track of the
newest removed xmax, what would make the most sense to me is for
freezing to keep track of and use the newest frozen xmin as the
conflict horizon.

I understand that if I wanted to actually use the newest frozen xmin
as the conflict horizon when the page is not all-frozen, I'd need a
new counter since visibility_cutoff_xid is calculated across all live
tuples older than OldestXmin -- not just ones we'll freeze. But, for
cases when a few tuples are frozen on the page, it seems like it could
be worth it?

- Melanie

#11

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#10)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 3:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I understand that if I wanted to actually use the newest frozen xmin
as the conflict horizon when the page is not all-frozen, I'd need a
new counter since visibility_cutoff_xid is calculated across all live
tuples older than OldestXmin -- not just ones we'll freeze.

The idea of always calculating precisely the oldest possible conflict
horizon XID that is still safe when performing freezing seems
reasonable to me. I'm certainly not opposed.

But, for cases when a few tuples are frozen on the page, it seems like it could
be worth it?

In general, I don't expect that we're all that likely to freeze some
tuples on the page, without being able to subsequently mark the whole
page as all-frozen in the VM. Obviously it happens more often when
VACUUM FREEZE is used -- though that works best as an argument against
VACUUM FREEZE.

A scheme like the one you're thinking of might be worth the
implementation effort if it ended up being simpler. As you pointed
out, we're already doing almost the same thing for pruning. Now that
pruning and freezing both happen in the same place, it might make
sense to do it just to make things more consistent.

--
Peter Geoghegan

#12

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#9)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 3:05 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jun 2, 2025 at 2:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

As you've explained, it will always be <= OldestXmin. And, if the
record only freezes tuples (meaning it makes no other changes to the
page) and all of those tuples' xmins were considered when calculating
prunestate->visibility_cutoff_xid, then, even if the page isn't
all-frozen, how could it be incorrect to use the
prunestate->visibility_cutoff_xid as the horizon?

Obviously, it isn't safe to use prunestate->visibility_cutoff_xid
unless it has actually been maintained using all of the tuples on the
page. It is primarily intended for use by the VM record (though it
doesn't have to continue to work that way, of course).

Oh, and, more specifically, in my previous email, I was wondering if,
and why, in 16 this diff wouldn't be correct

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index 9fa88960ada..f1831bba95c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1833,7 +1833,7 @@ retry:
             * once we're done with it.  Otherwise we generate a conservative
             * cutoff by stepping back from OldestXmin.
             */
-           if (prunestate->all_visible && prunestate->all_frozen)
+           if (prunestate->all_visible)
            {
                /* Using same cutoff when setting VM is now unnecessary */
                snapshotConflictHorizon = prunestate->visibility_cutoff_xid;

- Melanie

#13

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#12)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 3:40 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Oh, and, more specifically, in my previous email, I was wondering if,
and why, in 16 this diff wouldn't be correct

I *think* that it would be correct.

Again, it is certainly possible to make the conflict horizon precisely
the oldest safe value in all cases. How much that actually buys you
seems much less clear.

--
Peter Geoghegan

#14

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#11)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 3:40 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jun 2, 2025 at 3:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

But, for cases when a few tuples are frozen on the page, it seems like it could
be worth it?

In general, I don't expect that we're all that likely to freeze some
tuples on the page, without being able to subsequently mark the whole
page as all-frozen in the VM. Obviously it happens more often when
VACUUM FREEZE is used -- though that works best as an argument against
VACUUM FREEZE.

A scheme like the one you're thinking of might be worth the
implementation effort if it ended up being simpler. As you pointed
out, we're already doing almost the same thing for pruning. Now that
pruning and freezing both happen in the same place, it might make
sense to do it just to make things more consistent.

Yes, I would only be interested in such a thing if it improved code
clarity too. I imagine that a few less queries canceled on the standby
couldn't be worth additional code complexity in this sensitive area
that, as is evidenced by my starting this thread, is already difficult
to understand.

Perhaps I could keep track of the newest modified xid or some such
thing that is the newer of the newest removed xmax and newest frozen
xmin.

I actually started looking at this code again because I am writing a
patch set to update the VM in the same critical section and WAL record
as where we now prune and freeze. As part of this, I have to figure
out the right snapshot conflict horizon to use for the combined WAL
record.

I think, however, that I can't avoid keeping a separate counter for
the horizon for the VM record. Pruning and freezing horizon is the
newest "modified" (pruned or frozen) tuple xid, whereas the VM record
needs the newest live committed tuple's xmin <= OldestXmin. So,
perhaps maintaining multiple counters is unavoidable.

So, I wasn't actually planning (originally) to write a patch to try
and change the horizon to make it older in more cases when it's
correct. I'm trying to figure out the most straightforward code to
calculate the combined snapshot conflict horizon for a prune/freeze/vm
update record.

Anyway, thanks for taking the time to answer these questions and
discuss. I've found that when working this stuff out on my own, I
sometimes end up going in circles.

- Melanie

#15

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#14)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 4:00 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think, however, that I can't avoid keeping a separate counter for
the horizon for the VM record. Pruning and freezing horizon is the
newest "modified" (pruned or frozen) tuple xid, whereas the VM record
needs the newest live committed tuple's xmin <= OldestXmin. So,
perhaps maintaining multiple counters is unavoidable.

Unlike pruning of dead tuples, freezing doesn't always happen to the
maximum possible extent that OldestXmin will allow. Presumably you'll
want to keep the ability to delay the decision to freeze or not freeze
until the last possible moment. I think that supporting that
requirement necessitates using multiple conflict horizon counters (you
don't know which one you'll end up using until the last possible
moment).

So, I wasn't actually planning (originally) to write a patch to try
and change the horizon to make it older in more cases when it's
correct. I'm trying to figure out the most straightforward code to
calculate the combined snapshot conflict horizon for a prune/freeze/vm
update record.

I figured that that was what this was really about.

--
Peter Geoghegan

#16

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Melanie Plageman (#14)

Re: Correcting freeze conflict horizon calculation

On Mon, Jun 2, 2025 at 4:00 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Perhaps I could keep track of the newest modified xid or some such
thing that is the newer of the newest removed xmax and newest frozen
xmin.

BTW, I don't think that you have to worry about removing/freezing xmax
when it comes to generating a safe snapshotConflictHorizon.

Nothing makes it unsafe for VACUUM to remove an xmax set by an updater
that is known to have aborted right away -- no matter how that xmax
compares to OldestXmin. Pruning has always been able to remove the
successor version right away, no matter how recently the abort
happened, so why shouldn't we *also* be able to set the xmax in the
original version to InvalidTransactionId? That doesn't need to be
taken into account by snapshotConflictHorizon handling. (Note that
HeapTupleHeaderAdvanceConflictHorizon() is specifically aware that an
aborted tuple's xmin shouldn't need to affect a prune record's
conflict horizon; an aborted update can also skip consideration by
snapshotConflictHorizon maintenance.)

I also think that a locker-only xmax can be removed/frozen/set to
InvalidTransactionId, as long as the tuple lock is no longer held on
the primary -- it can safely be set to InvalidTransactionId, without
it needing to affect snapshotConflictHorizon tracking at all. In a
way, we already do this with Multis.

It's already possible (though not particularly common) for VACUUM to
fully set a Multi xmax to InvalidTransactionId when the Multi is >
OldestMxact -- it can even happen when most of the individual members
are still > OldestXmin. As long as the operation cannot affect tuple
visibility on standbys, no special snapshotConflictHorizon is
required. We do currently end up using OldestXmin-1 as our
snapshotConflictHorizon when this happens, but as I said in the other
email, I don't think that that's really required.

--
Peter Geoghegan

#17

Peter Geoghegan

pg@bowt.ie

9 months ago

In reply to: Peter Geoghegan (#16)

Re: Correcting freeze conflict horizon calculation

On Tue, Jun 3, 2025 at 1:59 PM Peter Geoghegan <pg@bowt.ie> wrote:

We do currently end up using OldestXmin-1 as our
snapshotConflictHorizon when this happens, but as I said in the other
email, I don't think that that's really required.

Actually, that's not really true, either. That is, it is possible
(though probably very rare) for VACUUM to set an xmax Multi >
OldestMxact with individual members that are still > OldestXmin, while
ultimately being able to set the page all-frozen in the VM.
snapshotConflictHorizon would then come from visibility_cutoff_xid --
which might end up being InvalidTransactionId.

--
Peter Geoghegan

#18

Melanie Plageman

melanieplageman@gmail.com

9 months ago

In reply to: Peter Geoghegan (#16)

Re: Correcting freeze conflict horizon calculation

On Tue, Jun 3, 2025 at 2:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jun 2, 2025 at 4:00 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Perhaps I could keep track of the newest modified xid or some such
thing that is the newer of the newest removed xmax and newest frozen
xmin.

BTW, I don't think that you have to worry about removing/freezing xmax
when it comes to generating a safe snapshotConflictHorizon.

Nothing makes it unsafe for VACUUM to remove an xmax set by an updater
that is known to have aborted right away -- no matter how that xmax
compares to OldestXmin. Pruning has always been able to remove the
successor version right away, no matter how recently the abort
happened, so why shouldn't we *also* be able to set the xmax in the
original version to InvalidTransactionId? That doesn't need to be
taken into account by snapshotConflictHorizon handling. (Note that
HeapTupleHeaderAdvanceConflictHorizon() is specifically aware that an
aborted tuple's xmin shouldn't need to affect a prune record's
conflict horizon; an aborted update can also skip consideration by
snapshotConflictHorizon maintenance.)

That makes sense. So, if I were to try to keep track of the newest
frozen tuple for the snapshot conflict horizon, it should only be the
newest frozen xmin (for regular transaction IDs) -- not the newest
frozen xid.

I also think that a locker-only xmax can be removed/frozen/set to
InvalidTransactionId, as long as the tuple lock is no longer held on
the primary -- it can safely be set to InvalidTransactionId, without
it needing to affect snapshotConflictHorizon tracking at all. In a
way, we already do this with Multis.

It's already possible (though not particularly common) for VACUUM to
fully set a Multi xmax to InvalidTransactionId when the Multi is >
OldestMxact -- it can even happen when most of the individual members
are still > OldestXmin. As long as the operation cannot affect tuple
visibility on standbys, no special snapshotConflictHorizon is
required. We do currently end up using OldestXmin-1 as our
snapshotConflictHorizon when this happens, but as I said in the other
email, I don't think that that's really required.

I'll have to review what happens to MultiXacts during vacuum. But,
from my initial reading of this, it sounds like you are pointing out
another special case where, if I were to try and implement a less
pessimistic snapshot conflict horizon after freezing, I would not need
to and, in fact, would want to avoid advancing the conflict horizon
even though I am making a modification to a tuple as part of
"freezing".

(also, I probably still won't try and take this project on, but I
think it is great to have these details and discussion in a thread so
I [or someone else] can go back and use this information to tackle it
in the future.)

- Melanie

#19

Melanie Plageman

melanieplageman@gmail.com

7 days ago

In reply to: Peter Geoghegan (#16)

Re: Correcting freeze conflict horizon calculation

On Tue, Jun 3, 2025 at 2:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

BTW, I don't think that you have to worry about removing/freezing xmax
when it comes to generating a safe snapshotConflictHorizon.

Nothing makes it unsafe for VACUUM to remove an xmax set by an updater
that is known to have aborted right away -- no matter how that xmax
compares to OldestXmin. Pruning has always been able to remove the
successor version right away, no matter how recently the abort
happened, so why shouldn't we *also* be able to set the xmax in the
original version to InvalidTransactionId? That doesn't need to be
taken into account by snapshotConflictHorizon handling. (Note that
HeapTupleHeaderAdvanceConflictHorizon() is specifically aware that an
aborted tuple's xmin shouldn't need to affect a prune record's
conflict horizon; an aborted update can also skip consideration by
snapshotConflictHorizon maintenance.)

I also think that a locker-only xmax can be removed/frozen/set to
InvalidTransactionId, as long as the tuple lock is no longer held on
the primary -- it can safely be set to InvalidTransactionId, without
it needing to affect snapshotConflictHorizon tracking at all. In a
way, we already do this with Multis.

It's already possible (though not particularly common) for VACUUM to
fully set a Multi xmax to InvalidTransactionId when the Multi is >
OldestMxact -- it can even happen when most of the individual members
are still > OldestXmin. As long as the operation cannot affect tuple
visibility on standbys, no special snapshotConflictHorizon is
required. We do currently end up using OldestXmin-1 as our
snapshotConflictHorizon when this happens, but as I said in the other
email, I don't think that that's really required.

I wrote the attached patch to keep track of the newest to-be-frozen
xid. I think the freeze_xmin and replace_xvac cases are right. But I'm
not sure about the freeze_xmax and replace_xmax for multixacts cases.

For the freeze_xmax case for regular transaction IDs, are you saying
that the only way you can have one older than OldestXmin is if the
update transaction aborted? Or are there other ways you can have an
xmax older than OldestXmin?

My patch does not consider any multi member xids when calculating the
newest to-be-frozen xid. Locker-only xmaxes shouldn't affect
visibility on the standby. And I think, partially based on what you
are saying above and partially from reading the code, that update XIDs
older than OldestXmin don't matter because they would be from aborted
transactions and XIDs newer than OldestXmin won't be removed or
frozen. Does this sound right?

- Melanie

#20

Peter Geoghegan

pg@bowt.ie

6 days ago

In reply to: Melanie Plageman (#19)

Re: Correcting freeze conflict horizon calculation

On Mon, Mar 2, 2026 at 12:15 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

It's already possible (though not particularly common) for VACUUM to
fully set a Multi xmax to InvalidTransactionId when the Multi is >
OldestMxact -- it can even happen when most of the individual members
are still > OldestXmin. As long as the operation cannot affect tuple
visibility on standbys, no special snapshotConflictHorizon is
required. We do currently end up using OldestXmin-1 as our
snapshotConflictHorizon when this happens, but as I said in the other
email, I don't think that that's really required.

I wrote the attached patch to keep track of the newest to-be-frozen
xid. I think the freeze_xmin and replace_xvac cases are right. But I'm
not sure about the freeze_xmax and replace_xmax for multixacts cases.

This looks directionally correct to me, at a minimum.

For the freeze_xmax case for regular transaction IDs, are you saying
that the only way you can have one older than OldestXmin is if the
update transaction aborted?

I don't quite follow. Are you talking about xmin or xmax? The xmax in
question must come from the original tuple, not the updated successor
version? (The successor is the tuple that pruning must have already
removed by the time we reach heap_prepare_freeze_tuple, preventing
heap_prepare_freeze_tuple from seeing an aborted xmin.)

The important principle here is that we don't need a recovery conflict
to handle cleanup after an aborted update/delete, regardless of the
details. This is a logical consequence of the fact that an aborted
transaction "never existed in the logical database".

This principle has nothing to do with implementation details such as
freeze plans and tuple headers. Of course, heap_prepare_freeze_tuple
sometimes needs to prepare a freeze plan that modifies a tuple updated
by an aborted transaction, just to set xmax to InvalidTransactionID.
But, again, no recovery conflict is required to do this safely because
this is for an aborted xact (same as with pruning that removed the
successor version).

Or are there other ways you can have an
xmax older than OldestXmin?

Again, are you talking about xmin or xmax? It's normal for
heap_prepare_freeze_tuple to see an xmax older than OldestXmin, last I
checked.

Don't forget about plain XIDs that end up as xmax due to a SELECT FOR
UPDATE. They usually don't result from aborted transactions.

Another important principle (which you've clearly followed here):
cleaning up after a locker doesn't require a recovery conflict because
it cannot affect tuple visibility on standbys.

My patch does not consider any multi member xids when calculating the
newest to-be-frozen xid. Locker-only xmaxes shouldn't affect
visibility on the standby.

But neither should the XIDs of updaters that aborted. I don't think
you should handle those at all.

And I think, partially based on what you
are saying above and partially from reading the code, that update XIDs
older than OldestXmin don't matter because they would be from aborted
transactions and XIDs newer than OldestXmin won't be removed or
frozen. Does this sound right?

I think that as a general rule VACUUM should never generate a
snapshotConflictHorizon that exactly equals OldestXmin (or a later
one). See commit 66fbcb0d.

I notice that you have an assertion
prstate->pagefrz.FreezePageConflictXid <= OldestXmin. I wonder if you
should strengthen the assertion to
prstate->pagefrz.FreezePageConflictXid < OldestXmin? I also wonder if
the existing assertion can fail due to an aborted update leaving an
xmax > OldestXmin.

--
Peter Geoghegan

#21

Melanie Plageman

melanieplageman@gmail.com

6 days ago

In reply to: Peter Geoghegan (#20)

#22

Peter Geoghegan

pg@bowt.ie

about 8 hours ago

In reply to: Melanie Plageman (#21)