All-visible pages with valid prune xid are confusing

Started by Melanie Plageman4 months ago10 messageshackers
Jump to latest
#1Melanie Plageman
melanieplageman@gmail.com

Hi,

We currently do not set pd_prune_xid (the oldest prunable XID) when
replaying XLOG_HEAP2_PRUNE* records. We've never done this, AFAICT.

Since 8.3, this comment has been in the pruning redo function:

* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.

During normal operation, when a page has no prunable tuples, we set
pd_prune_xid to InvalidTransactionId. But during recovery, the old
value is left behind.

When we then set the page all-visible in the VM, the page is marked
all-visible but the prune hint claims there are prunable tuples. On
the standby, this triggers an unnecessary prune cycle of almost all
all-visible pages the next time they are accessed. However, I think
the page being in this confusing state is the bigger problem. It's not
incorrect, but it seems like it could mask actual page corruption
(e.g. when there are dead tuples and we mistakenly set the page
all-visible).

Fixing this would require adding the prune xid to the WAL record.
UPDATE/DELETE WAL records don't have to include the new prune xid
because they set the page prune hint to the xlog record's transaction
ID.

If we don't think the overhead of the extra transaction ID in the WAL
record is worth it, we could set the prune hint to
InvalidTranasctionId during recovery if the page is all-visible. This
would at least avoid that confusing page state.

- Melanie

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Melanie Plageman (#1)
Re: All-visible pages with valid prune xid are confusing

On 02/12/2025 18:20, Melanie Plageman wrote:

Hi,

We currently do not set pd_prune_xid (the oldest prunable XID) when
replaying XLOG_HEAP2_PRUNE* records. We've never done this, AFAICT.

Since 8.3, this comment has been in the pruning redo function:

* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.

During normal operation, when a page has no prunable tuples, we set
pd_prune_xid to InvalidTransactionId. But during recovery, the old
value is left behind.

When we then set the page all-visible in the VM, the page is marked
all-visible but the prune hint claims there are prunable tuples. On
the standby, this triggers an unnecessary prune cycle of almost all
all-visible pages the next time they are accessed. However, I think
the page being in this confusing state is the bigger problem. It's not
incorrect, but it seems like it could mask actual page corruption
(e.g. when there are dead tuples and we mistakenly set the page
all-visible).

Fixing this would require adding the prune xid to the WAL record.
UPDATE/DELETE WAL records don't have to include the new prune xid
because they set the page prune hint to the xlog record's transaction
ID.

If we don't think the overhead of the extra transaction ID in the WAL
record is worth it, we could set the prune hint to
InvalidTranasctionId during recovery if the page is all-visible. This
would at least avoid that confusing page state.

Hmm. If the page has no prunable tuples left, it makes sense to set
pd_prune_xid to InvalidTransactionId to avoid the useless round of
pruning. In other cases, it would make sense to set it to some XID so
that it gets pruned later. But a standby will only start pruning if it's
later promoted to become a primary. At that point, all currently running
transactions will be finished (except for prepared transactions).
Therefore it doesn't seem important what we set the prune XID to. Any
valid XID that's not in the future should do the trick. So how about
adding a boolean flag to the WAL record, to indicate whether there's
anything prunable left on the page or not?

- Heikki

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: All-visible pages with valid prune xid are confusing

On Tue, Dec 2, 2025 at 12:49 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm. If the page has no prunable tuples left, it makes sense to set
pd_prune_xid to InvalidTransactionId to avoid the useless round of
pruning. In other cases, it would make sense to set it to some XID so
that it gets pruned later. But a standby will only start pruning if it's
later promoted to become a primary. At that point, all currently running
transactions will be finished (except for prepared transactions).

What about on-access pruning during SELECT queries on a hot standby?

- Melanie

#4Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#3)
Re: All-visible pages with valid prune xid are confusing

Hi,

On December 2, 2025 1:23:57 PM EST, Melanie Plageman <melanieplageman@gmail.com> wrote:

On Tue, Dec 2, 2025 at 12:49 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm. If the page has no prunable tuples left, it makes sense to set
pd_prune_xid to InvalidTransactionId to avoid the useless round of
pruning. In other cases, it would make sense to set it to some XID so
that it gets pruned later. But a standby will only start pruning if it's
later promoted to become a primary. At that point, all currently running
transactions will be finished (except for prepared transactions).

What about on-access pruning during SELECT queries on a hot standby?

There's no on-access-pruning on the hot standby itself, it'd lead to divergence between primary and standby (and you couldn't WAL log it).

Greetings,

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#5Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#4)
Re: All-visible pages with valid prune xid are confusing

On Tue, Dec 2, 2025 at 1:41 PM Andres Freund <andres@anarazel.de> wrote:

On December 2, 2025 1:23:57 PM EST, Melanie Plageman <melanieplageman@gmail.com> wrote:

What about on-access pruning during SELECT queries on a hot standby?

There's no on-access-pruning on the hot standby itself, it'd lead to divergence between primary and standby (and you couldn't WAL log it).

Ah, right.

Therefore it doesn't seem important what we set the prune XID to. Any
valid XID that's not in the future should do the trick. So how about
adding a boolean flag to the WAL record, to indicate whether there's
anything prunable left on the page or not?

We could add a flag to xl_heap_prune flags (which is now a uint16 and
thus has room) to indicate that there are prunable tuples.

In terms of finding some XID to set it to, could we do what updates
and deletes do and use the XLogRecord->xl_xid?

- Melanie

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Melanie Plageman (#5)
Re: All-visible pages with valid prune xid are confusing

On 02/12/2025 21:13, Melanie Plageman wrote:

In terms of finding some XID to set it to, could we do what updates
and deletes do and use the XLogRecord->xl_xid?

A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if
the transaction hasn't been assigned a transaction ID yet. I think
ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat
rather than plain - 1, of course)

- Heikki

#7Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#6)
Re: All-visible pages with valid prune xid are confusing

Hi,

On 2025-12-02 21:39:42 +0200, Heikki Linnakangas wrote:

On 02/12/2025 21:13, Melanie Plageman wrote:

In terms of finding some XID to set it to, could we do what updates
and deletes do and use the XLogRecord->xl_xid?

A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if the
transaction hasn't been assigned a transaction ID yet. I think
ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat rather
than plain - 1, of course)

I think it'd be preferrable if we just made the records large enough to
maintain the same pd_prune_xid on the standby as on the primary. The closer
the pages are on primary & standby the better, any allowed divergence makes it
harder to find bugs imo. In comparison to the space the relevant records use,
it doesn't seem like including an accurate prune xid would take a lot of
space?

Greetings,

Andres

#8Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#6)
Re: All-visible pages with valid prune xid are confusing

On Tue, Dec 2, 2025 at 2:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if
the transaction hasn't been assigned a transaction ID yet. I think
ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat
rather than plain - 1, of course)

I was thinking about this some more, and I propose we just clear
pd_prune_xid whenever we set PD_ALL_VISIBLE like in the attached
patch.

AFAICT we weren't clearing it when setting the VM in vacuum phase III
-- which meant it would have a stale value on a normal Postgres
primary (not a promoted standby).

And we weren't initializing pd_prune_xid to InvalidTransactionId in
COPY FREEZE (though perhaps the page will be zero-initialized and thus
pd_prune_xid will read as InvalidTransactionId anyway).

So, this patch covers those cases plus clears pd_prune_xid on redo. As
previously discussed, this saves prune cycles after promotion.

I take your point that it could be worth setting pd_prune_xid to
something on the standby when there are prunable tuples left so that
we trigger a prune cycle after promotion, but since Andres didn't like
that idea and we'll update pd_prune_xid on the next delete/update
anyway, I think we don't need to do it.

Regarding Andres' idea to include pd_prune_xid to the xl_heap_prune
WAL record: I don't feel quite comfortable that pd_prune_xid would
then be half WAL-logged (since it isn't logged when update/delete sets
it). Maybe it's fine, though.

Anyway, I think my attached patch is an improvement with no risk

- Melanie

Attachments:

v1-0001-Save-prune-cycles-by-consistently-clearing-prune-.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Save-prune-cycles-by-consistently-clearing-prune-.patchDownload+10-1
#9Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#8)
Re: All-visible pages with valid prune xid are confusing

Hi,

On 2026-02-27 18:56:48 -0500, Melanie Plageman wrote:

On Tue, Dec 2, 2025 at 2:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if
the transaction hasn't been assigned a transaction ID yet. I think
ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat
rather than plain - 1, of course)

I was thinking about this some more, and I propose we just clear
pd_prune_xid whenever we set PD_ALL_VISIBLE like in the attached
patch.

Think that makes sense.

I take your point that it could be worth setting pd_prune_xid to
something on the standby when there are prunable tuples left so that
we trigger a prune cycle after promotion, but since Andres didn't like
that idea and we'll update pd_prune_xid on the next delete/update
anyway, I think we don't need to do it.

I mean I mainly disliked it because I think it'd be better if we "just" ensure
the values are the same (with perhaps an exception for it being unset on the
primary and set on the standby, if the pd_prune_xid is cleared without WAL
logging). If we don't do that, I think setting it to something vaguely useful
is better than not doing anything.

Regarding Andres' idea to include pd_prune_xid to the xl_heap_prune
WAL record: I don't feel quite comfortable that pd_prune_xid would
then be half WAL-logged (since it isn't logged when update/delete sets
it). Maybe it's fine, though.

What do you mean with half WAL logged? Just that PageSetPrunable() will take
the current pd_prune_xid into account, which could differ on the standby?
We'd still not put in an XID that wasn't in a WAL record, right?

I think the case of xl_heap_prune is markedly different from e.g. the delete
case, because the choice of the xid to use for pd_prune_xid is a lot more
complicated in the other cases. Without inspecting all the surviving rows, the
standby just don't have a sane value to put into pd_prune_xid. Whereas for a
delete, update, insert we know that it's just the smaller of the current
pd_prune_xid and the xid of the record.

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index dcbaf49401b..88bcbf55ed7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1932,6 +1932,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);

PageSetAllVisible(page);
+ PageClearPrunable(page);
visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
@@ -2943,6 +2944,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
+ PageClearPrunable(page);
visibilitymap_set_vmbits(blkno,
vmbuffer, vmflags,
vacrel->rel->rd_locator);

What about the PageSetAllVisible() in lazy_scan_prune()? I know that code
will vanish soon, but still...

Greetings,

Andres Freund

#10Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#9)
Re: All-visible pages with valid prune xid are confusing

On Sat, Feb 28, 2026 at 1:55 PM Andres Freund <andres@anarazel.de> wrote:

On 2026-02-27 18:56:48 -0500, Melanie Plageman wrote:

Regarding Andres' idea to include pd_prune_xid to the xl_heap_prune
WAL record: I don't feel quite comfortable that pd_prune_xid would
then be half WAL-logged (since it isn't logged when update/delete sets
it). Maybe it's fine, though.

What do you mean with half WAL logged? Just that PageSetPrunable() will take
the current pd_prune_xid into account, which could differ on the standby?
We'd still not put in an XID that wasn't in a WAL record, right?

Yes, I guess since the xl_xid is from the primary and the standby uses
that to compare to the current pd_prune_xid, they'll only be different
if the standby's pd_prune_xid is different.

I think the case of xl_heap_prune is markedly different from e.g. the delete
case, because the choice of the xid to use for pd_prune_xid is a lot more
complicated in the other cases. Without inspecting all the surviving rows, the
standby just don't have a sane value to put into pd_prune_xid. Whereas for a
delete, update, insert we know that it's just the smaller of the current
pd_prune_xid and the xid of the record.

I think adding pd_prune_xid to xl_heap_prune is a worthwhile addition
on top of this commit since this commit also clears pd_prune_xid in a
few other cases.

What about the PageSetAllVisible() in lazy_scan_prune()? I know that code
will vanish soon, but still...

Good point. I think my LSP plugin got confused switching back and
forth between this branch and the one where I remove that code and
missed the reference. Fixed in the pushed version.

Thanks for the review!

- Melanie