Avoiding unnecessary clog lookups while freezing

Started by Peter Geogheganabout 3 years ago13 messages
1 attachment(s)

I took another look at the code coverage situation around freezing
following pushing the page-level freezing patch earlier today. I
spotted an issue that I'd missed up until now: certain sanity checks
in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more
often than really seems necessary.

Theoretically this is an old issue that dates back to commit
699bf7d05c, as opposed to an issue in the page-level freezing patch.
But that's not really true in a real practical sense. In practice the
calls to TransactionIdDidCommit() will happen far more frequently
following today's commit 1de58df4fe (since we're using OldestXmin as
the cutoff that gates performing freeze_xmin/freeze_xmax processing --
not FreezeLimit).

No regressions related to clog lookups by VACUUM were apparent from my
performance validation of the page-level freezing work, but I suspect
that the increase in TransactionIdDidCommit() calls will cause
noticeable regressions with the right/wrong workload and/or
configuration. The page-level freezing work is expected to reduce clog
lookups in VACUUM in general, which is bound to have been a
confounding factor.

I see no reason why we can't just condition the XID sanity check calls
to TransactionIdDidCommit() (for both the freeze_xmin and the
freeze_xmax callsites) on a cheap tuple hint bit precheck not being
enough. We only need an expensive call to TransactionIdDidCommit()
when the precheck doesn't immediately indicate that the tuple xmin
looks committed when that's what the sanity check expects to see (or
that the tuple's xmax looks aborted, in the case of the callsite where
that's what we expect to see).

Attached patch shows what I mean. A quick run of the standard
regression tests (with custom instrumentation) shows that this patch
eliminates 100% of all relevant calls to TransactionIdDidCommit(), for
both the freeze_xmin and the freeze_xmax callsites.

--
Peter Geoghegan

Attachments:

v1-0001-Avoid-unnecessary-clog-lookups-when-freezing.patchapplication/octet-stream; name=v1-0001-Avoid-unnecessary-clog-lookups-when-freezing.patchDownload
From 683c865d1ef64659baaf139c07364ac13fd509ca Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 28 Dec 2022 14:29:29 -0800
Subject: [PATCH v1] Avoid unnecessary clog lookups when freezing.

---
 src/backend/access/heap/heapam.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34d83dc70..2d5cd9a33 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6524,7 +6524,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									 xid, cutoffs->relfrozenxid)));
 
 		freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
-		if (freeze_xmin && !TransactionIdDidCommit(xid))
+		if (freeze_xmin && !HeapTupleHeaderXminCommitted(tuple) &&
+			!TransactionIdDidCommit(xid))
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
 					 errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen",
@@ -6674,6 +6675,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 * lock).
 		 */
 		if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
+			(tuple->t_infomask & HEAP_XMAX_INVALID) == 0 &&
 			TransactionIdDidCommit(xid))
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
-- 
2.38.1

#2Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#1)
Re: Avoiding unnecessary clog lookups while freezing

Hi,

On 2022-12-28 15:24:28 -0800, Peter Geoghegan wrote:

I took another look at the code coverage situation around freezing
following pushing the page-level freezing patch earlier today. I
spotted an issue that I'd missed up until now: certain sanity checks
in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more
often than really seems necessary.

Theoretically this is an old issue that dates back to commit
699bf7d05c, as opposed to an issue in the page-level freezing patch.
But that's not really true in a real practical sense. In practice the
calls to TransactionIdDidCommit() will happen far more frequently
following today's commit 1de58df4fe (since we're using OldestXmin as
the cutoff that gates performing freeze_xmin/freeze_xmax processing --
not FreezeLimit).

Hm. But we still only do the check when we actually freeze, rather than just
during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
increased overhead when we also do more WAL logging etc. Correct?

I took another look at the code coverage situation around freezing
I see no reason why we can't just condition the XID sanity check calls
to TransactionIdDidCommit() (for both the freeze_xmin and the
freeze_xmax callsites) on a cheap tuple hint bit precheck not being
enough. We only need an expensive call to TransactionIdDidCommit()
when the precheck doesn't immediately indicate that the tuple xmin
looks committed when that's what the sanity check expects to see (or
that the tuple's xmax looks aborted, in the case of the callsite where
that's what we expect to see).

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

Attached patch shows what I mean. A quick run of the standard
regression tests (with custom instrumentation) shows that this patch
eliminates 100% of all relevant calls to TransactionIdDidCommit(), for
both the freeze_xmin and the freeze_xmax callsites.

There's practically no tuple-level concurrent activity in a normal regression
test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more
interesting to run tests, including isolationtester and pgbench, against a
running cluster.

Greetings,

Andres Freund

In reply to: Andres Freund (#2)
Re: Avoiding unnecessary clog lookups while freezing

On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote:

Theoretically this is an old issue that dates back to commit
699bf7d05c, as opposed to an issue in the page-level freezing patch.
But that's not really true in a real practical sense. In practice the
calls to TransactionIdDidCommit() will happen far more frequently
following today's commit 1de58df4fe (since we're using OldestXmin as
the cutoff that gates performing freeze_xmin/freeze_xmax processing --
not FreezeLimit).

Hm. But we still only do the check when we actually freeze, rather than just
during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
increased overhead when we also do more WAL logging etc. Correct?

Yes, that's how it worked up until today's commit 1de58df4fe.

I don't have strong feelings about back patching a fix, but this does
seem like something that I should fix now, on HEAD.

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

I think that it's a reasonable check, and I'm totally in favor of
keeping it (or something very close, at least).

There's practically no tuple-level concurrent activity in a normal regression
test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more
interesting to run tests, including isolationtester and pgbench, against a
running cluster.

Even on HEAD, with page-level freezing in place, we're only going to
test XIDs that are < OldestXmin, that appear on pages tha VACUUM
actually scans in the first place. Just checking tuple-level hint bits
should be effective. But even if it isn't (for whatever reason) then
it's similar to cases where our second heap pass has to do clog
lookups in heap_page_is_all_visible() just because hint bits couldn't
be set earlier on, back when lazy_scan_prune() processed the same page
during VACUUM's initial heap pass.

--
Peter Geoghegan

#4Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#3)
Re: Avoiding unnecessary clog lookups while freezing

On 2022-12-28 16:37:27 -0800, Peter Geoghegan wrote:

On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote:

Theoretically this is an old issue that dates back to commit
699bf7d05c, as opposed to an issue in the page-level freezing patch.
But that's not really true in a real practical sense. In practice the
calls to TransactionIdDidCommit() will happen far more frequently
following today's commit 1de58df4fe (since we're using OldestXmin as
the cutoff that gates performing freeze_xmin/freeze_xmax processing --
not FreezeLimit).

Hm. But we still only do the check when we actually freeze, rather than just
during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
increased overhead when we also do more WAL logging etc. Correct?

Yes, that's how it worked up until today's commit 1de58df4fe.

I don't have strong feelings about back patching a fix, but this does
seem like something that I should fix now, on HEAD.

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

I think that it's a reasonable check, and I'm totally in favor of
keeping it (or something very close, at least).

I don't quite follow - one paragraph up you say we should fix something, and
then here you seem to say we should continue not to rely on the hint bits?

In reply to: Andres Freund (#4)
Re: Avoiding unnecessary clog lookups while freezing

On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote:

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

I think that it's a reasonable check, and I'm totally in favor of
keeping it (or something very close, at least).

I don't quite follow - one paragraph up you say we should fix something, and
then here you seem to say we should continue not to rely on the hint bits?

I didn't mean that we should continue to not rely on the hint bits. Is
that really all that the test is for? I think of it as a general sanity check.

The important call to avoid with page-level freezing is the xmin call to
TransactionIdDidCommit(), not the xmax call. The xmax call only occurs
when VACUUM prepares to freeze a tuple that was updated by an updater
(not a locker) that aborted. While the xmin calls will now take place with most
unfrozen tuples.

--
Peter Geoghegan

#6Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#5)
Re: Avoiding unnecessary clog lookups while freezing

Hi,

On 2022-12-28 22:36:53 -0800, Peter Geoghegan wrote:

On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote:

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

I think that it's a reasonable check, and I'm totally in favor of
keeping it (or something very close, at least).

I don't quite follow - one paragraph up you say we should fix something, and
then here you seem to say we should continue not to rely on the hint bits?

I didn't mean that we should continue to not rely on the hint bits. Is
that really all that the test is for? I think of it as a general sanity check.

I do think we wanted to avoid reviving actually-dead tuples (present due to
the multixact and related bugs). And I'm worried about giving that checking
up, I've seen it hit too many times. Both in the real world and during
development.

Somewhat of a tangent: I've previously wondered if we should have a small
hash-table based clog cache. The current one-element cache doesn't suffice in
a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
clog accesses.

Greetings,

Andres Freund

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Avoiding unnecessary clog lookups while freezing

Andres Freund <andres@anarazel.de> writes:

Somewhat of a tangent: I've previously wondered if we should have a small
hash-table based clog cache. The current one-element cache doesn't suffice in
a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
clog accesses.

I've wondered about that too. The one-element cache was a good hack
in its day, but it looks a bit under-engineered by our current
standards. (Also, maybe it'd be plausible to have a one-element
cache in front of a small hash?)

regards, tom lane

In reply to: Andres Freund (#6)
Re: Avoiding unnecessary clog lookups while freezing

On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote:

I do think we wanted to avoid reviving actually-dead tuples (present due to
the multixact and related bugs). And I'm worried about giving that checking
up, I've seen it hit too many times. Both in the real world and during
development.

I could just move the same tests from heap_prepare_freeze_tuple() to
heap_freeze_execute_prepared(), without changing any of the details.
That would mean that the TransactionIdDidCommit() calls would only
take place with tuples that actually get frozen, which is more or less
how it worked before now.

heap_prepare_freeze_tuple() will now often prepare freeze plans that
just get discarded by lazy_scan_prune(). My concern is the impact on
tables/pages that almost always discard prepared freeze plans, and so
require many TransactionIdDidCommit() calls that really aren't
necessary.

Somewhat of a tangent: I've previously wondered if we should have a small
hash-table based clog cache. The current one-element cache doesn't suffice in
a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
clog accesses.

I imagine that the one-element cache works alright in some scenarios,
but then suddenly doesn't work so well, even though not very much has
changed. Behavior like that makes the problems difficult to analyze,
and easy to miss. I'm suspicious of that.

--
Peter Geoghegan

#9Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#8)
Re: Avoiding unnecessary clog lookups while freezing

Hi,

On 2022-12-29 09:43:39 -0800, Peter Geoghegan wrote:

On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote:

I do think we wanted to avoid reviving actually-dead tuples (present due to
the multixact and related bugs). And I'm worried about giving that checking
up, I've seen it hit too many times. Both in the real world and during
development.

I could just move the same tests from heap_prepare_freeze_tuple() to
heap_freeze_execute_prepared(), without changing any of the details.

That might work, yes.

That would mean that the TransactionIdDidCommit() calls would only
take place with tuples that actually get frozen, which is more or less
how it worked before now.

heap_prepare_freeze_tuple() will now often prepare freeze plans that
just get discarded by lazy_scan_prune(). My concern is the impact on
tables/pages that almost always discard prepared freeze plans, and so
require many TransactionIdDidCommit() calls that really aren't
necessary.

It seems somewhat wrong that we discard all the work that
heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
a bunch of important cases (e.g. creating a new multixact), but even so,
e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
just creating the freeze plan free.

I think the better approach might be to make heap_tuple_should_freeze() more
powerful and to only create the freeze plan when actually freezing.

I wonder how often it'd be worthwhile to also do opportunistic freezing during
lazy_vacuum_heap_page(), given that we already will WAL log (and often issue
an FPI).

Somewhat of a tangent: I've previously wondered if we should have a small
hash-table based clog cache. The current one-element cache doesn't suffice in
a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
clog accesses.

I imagine that the one-element cache works alright in some scenarios,
but then suddenly doesn't work so well, even though not very much has
changed. Behavior like that makes the problems difficult to analyze,
and easy to miss. I'm suspicious of that.

I think there's a lot of situations where it flat out doesn't work - even if
you just have an inserting and a deleting transaction, we'll often end up not
hitting the 1-element cache due to looking up two different xids in a roughly
alternating pattern...

Greetings,

Andres Freund

In reply to: Andres Freund (#9)
1 attachment(s)
Re: Avoiding unnecessary clog lookups while freezing

On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote:

I could just move the same tests from heap_prepare_freeze_tuple() to
heap_freeze_execute_prepared(), without changing any of the details.

That might work, yes.

Attached patch shows how that could work.

It seems somewhat wrong that we discard all the work that
heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
a bunch of important cases (e.g. creating a new multixact), but even so,
e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
just creating the freeze plan free.

I'm not sure what you mean by that. I believe that the page-level
freezing changes do not allow FreezeMultiXactId() to call
GetMultiXactIdMembers() any more often than before. Are you concerned
about a regression, or something more general than that?

The only case that we *don't* force xmax freezing in
FreezeMultiXactId() is the FRM_NOOP case. Note in particular that we
will reliably force freezing for any Multi < OldestMxact (not <
MultiXactCutoff).

I wonder how often it'd be worthwhile to also do opportunistic freezing during
lazy_vacuum_heap_page(), given that we already will WAL log (and often issue
an FPI).

Yeah, we don't actually need a cleanup lock for that. It might also
make sense to teach lazy_scan_prune() to anticipate what will happen
later on, in lazy_vacuum_heap_page(), so that it can freeze based on
the same observation about the cost. (It already does a certain amount
of this kind of thing, in fact.)

--
Peter Geoghegan

Attachments:

v2-0001-Check-xmin-xmax-commit-status-when-freezing-execu.patchapplication/octet-stream; name=v2-0001-Check-xmin-xmax-commit-status-when-freezing-execu.patchDownload
From 811416670a11e8cfa7809fa5dc1bc522e3dc7b88 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 29 Dec 2022 11:54:30 -0800
Subject: [PATCH v2] Check xmin/xmax commit status when freezing executes.

---
 src/include/access/heapam.h      | 10 ++++
 src/backend/access/heap/heapam.c | 83 +++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 09a1993f4..1644985e1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -100,6 +100,14 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/*
+ * heap_prepare_freeze_tuple requests that heap_freeze_execute_prepared check
+ * xmin committed and/or xmax aborted.  These are delayed until freeze plan
+ * execution to avoid unnecessary clog lookups.
+ */
+#define		HEAP_FREEZE_XMIN_CHECK	0x01
+#define		HEAP_FREEZE_XMAX_CHECK	0x02
+
 /* heap_prepare_freeze_tuple state describing how to freeze a tuple */
 typedef struct HeapTupleFreeze
 {
@@ -109,6 +117,8 @@ typedef struct HeapTupleFreeze
 	uint16		t_infomask;
 	uint8		frzflags;
 
+	/* xmin/xmax check flags */
+	uint8		checkflags;
 	/* Page offset number for tuple */
 	OffsetNumber offset;
 } HeapTupleFreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34d83dc70..353733aa3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6502,10 +6502,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 				freeze_xmax = false;
 	TransactionId xid;
 
-	frz->frzflags = 0;
+	frz->xmax = HeapTupleHeaderGetRawXmax(tuple);
 	frz->t_infomask2 = tuple->t_infomask2;
 	frz->t_infomask = tuple->t_infomask;
-	frz->xmax = HeapTupleHeaderGetRawXmax(tuple);
+	frz->frzflags = 0;
+	frz->checkflags = 0;
 
 	/*
 	 * Process xmin, while keeping track of whether it's already frozen, or
@@ -6523,14 +6524,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 					 errmsg_internal("found xmin %u from before relfrozenxid %u",
 									 xid, cutoffs->relfrozenxid)));
 
-		freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
-		if (freeze_xmin && !TransactionIdDidCommit(xid))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen",
-									 xid, cutoffs->OldestXmin)));
-
 		/* Will set freeze_xmin flags in freeze plan below */
+		freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+
+		/* Verify that xmin committed if and when freeze plan is executed */
+		if (freeze_xmin)
+			frz->checkflags |= HEAP_FREEZE_XMIN_CHECK;
 	}
 
 	/*
@@ -6553,7 +6552,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	}
 
 	/* Now process xmax */
-	xid = HeapTupleHeaderGetRawXmax(tuple);
+	xid = frz->xmax;
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 	{
 		/* Raw xmax is a MultiXactId */
@@ -6664,21 +6663,16 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 					 errmsg_internal("found xmax %u from before relfrozenxid %u",
 									 xid, cutoffs->relfrozenxid)));
 
-		if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
-			freeze_xmax = true;
+		/* Will set freeze_xmax flags in freeze plan below */
+		freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
 
 		/*
-		 * If we freeze xmax, make absolutely sure that it's not an XID that
-		 * is important.  (Note, a lock-only xmax can be removed independent
-		 * of committedness, since a committed lock holder has released the
-		 * lock).
+		 * Verify that xmax aborted if and when freeze plan is executed,
+		 * provided it's from an update. (A lock-only xmax can be removed
+		 * independent of this, since the lock is released at xact end.)
 		 */
-		if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
-			TransactionIdDidCommit(xid))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg_internal("cannot freeze committed xmax %u",
-									 xid)));
+		if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+			frz->checkflags |= HEAP_FREEZE_XMAX_CHECK;
 	}
 	else if (!TransactionIdIsValid(xid))
 	{
@@ -6806,17 +6800,58 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
 
 	Assert(ntuples > 0);
 
+	/*
+	 * Perform xmin/xmax commit status sanity checks before critical section.
+	 *
+	 * These checks are delayed until execution time to avoid needlessly
+	 * checking commit status before it's really necessary.
+	 */
+	for (int i = 0; i < ntuples; i++)
+	{
+		HeapTupleFreeze *frz = tuples + i;
+		ItemId		itemid = PageGetItemId(page, frz->offset);
+		HeapTupleHeader htup;
+
+		htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+		/* Verify that xmin committed when that's expected */
+		if (frz->checkflags & HEAP_FREEZE_XMIN_CHECK)
+		{
+			TransactionId xmin = HeapTupleHeaderGetRawXmin(htup);
+
+			Assert(!HeapTupleHeaderXminFrozen(htup));
+			if (unlikely(!TransactionIdDidCommit(xmin)))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("uncommitted xmin %u needs to be frozen",
+										 xmin)));
+		}
+		/* Verify that xmax aborted when that's expected */
+		if (frz->checkflags & HEAP_FREEZE_XMAX_CHECK)
+		{
+			TransactionId xmax = HeapTupleHeaderGetRawXmax(htup);
+
+			Assert(TransactionIdIsNormal(xmax));
+			if (unlikely(!TransactionIdDidAbort(xmax)))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("cannot freeze nonaborted xmax %u",
+										 xmax)));
+		}
+	}
+
 	START_CRIT_SECTION();
 
 	MarkBufferDirty(buffer);
 
 	for (int i = 0; i < ntuples; i++)
 	{
+		HeapTupleFreeze *frz = tuples + i;
+		ItemId		itemid = PageGetItemId(page, frz->offset);
 		HeapTupleHeader htup;
-		ItemId		itemid = PageGetItemId(page, tuples[i].offset);
 
 		htup = (HeapTupleHeader) PageGetItem(page, itemid);
-		heap_execute_freeze_tuple(htup, &tuples[i]);
+		heap_execute_freeze_tuple(htup, frz);
 	}
 
 	/* Now WAL-log freezing if necessary */
-- 
2.38.1

In reply to: Peter Geoghegan (#10)
Re: Avoiding unnecessary clog lookups while freezing

On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:

It seems somewhat wrong that we discard all the work that
heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
a bunch of important cases (e.g. creating a new multixact), but even so,
e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
just creating the freeze plan free.

I'm not sure what you mean by that. I believe that the page-level
freezing changes do not allow FreezeMultiXactId() to call
GetMultiXactIdMembers() any more often than before. Are you concerned
about a regression, or something more general than that?

Here's an idea that seems like it could ameliorate the issue:

When we're looping through members from GetMultiXactIdMembers(), and
seeing if we can get away with !need_replace/FRM_NOOP processing, why
not also check if there are any XIDs >= OldestXmin among the members?
If not (if they're all < OldestXmin), then we should prefer to go
further with processing the Multi now -- FRM_NOOP processing isn't
actually cheaper.

We'll already know that a second pass over the multi really isn't
expensive. And that it will actually result in FRM_INVALIDATE_XMAX
processing, which is ideal. Avoiding a second pass is really just
about avoiding FRM_RETURN_IS_MULTI (and avoiding FRM_RETURN_IS_XID,
perhaps, though to a much lesser degree).

--
Peter Geoghegan

In reply to: Peter Geoghegan (#11)
1 attachment(s)
Re: Avoiding unnecessary clog lookups while freezing

On Thu, Dec 29, 2022 at 12:50 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:

It seems somewhat wrong that we discard all the work that
heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
a bunch of important cases (e.g. creating a new multixact), but even so,
e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
just creating the freeze plan free.

Here's an idea that seems like it could ameliorate the issue:

When we're looping through members from GetMultiXactIdMembers(), and
seeing if we can get away with !need_replace/FRM_NOOP processing, why
not also check if there are any XIDs >= OldestXmin among the members?
If not (if they're all < OldestXmin), then we should prefer to go
further with processing the Multi now -- FRM_NOOP processing isn't
actually cheaper.

Attached patch shows what I mean here.

I think that there is a tendency for OldestMxact to be held back by a
disproportionate amount (relative to OldestXmin) in the presence of
long running transactions and concurrent updates from short-ish
transactions. The way that we maintain state in shared memory to
compute OldestMxact (OldestMemberMXactId/OldestVisibleMXactId) seems
to be vulnerable to that kind of thing. I'm thinking of scenarios
where MultiXactIdSetOldestVisible() gets called from a long-running
xact, at the first point that it examines any multi, just to read
something. That effectively makes the long-running xact hold back
OldestMxact, even when it doesn't hold back OldestXmin. A read-only
transaction that runs in READ COMMITTED could do this -- it'll call
OldestVisibleMXactId() and "lock in" the oldest visible Multi that it
needs to continue to see as running, without clearing that until much
later (until AtEOXact_MultiXact() is called at xact commit/abort). And
without doing anything to hold back OldestXmin by the same amount, or
for the same duration.

That's the kind of scenario that the patch might make a difference in
-- because it exploits the fact that OldestXmin can be a lot less
vulnerable than OldestMxact is to being held back by long running
xacts. Does that seem plausible to you?

--
Peter Geoghegan

Attachments:

0001-Prefer-FRM_INVALIDATE_XMAX-over-FRM_NOOP-when-cheape.patchapplication/octet-stream; name=0001-Prefer-FRM_INVALIDATE_XMAX-over-FRM_NOOP-when-cheape.patchDownload
From a070c3d806a1d05bab24d040ad4a3bdb0a951b60 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 29 Dec 2022 13:25:24 -0800
Subject: [PATCH] Prefer FRM_INVALIDATE_XMAX over FRM_NOOP when cheaper.

---
 src/backend/access/heap/heapam.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34d83dc70..0bc7d27d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6146,7 +6146,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	TransactionId newxmax;
 	MultiXactMember *members;
 	int			nmembers;
-	bool		need_replace;
+	bool		need_replace,
+				cheap_to_replace;
 	int			nnewmembers;
 	MultiXactMember *newmembers;
 	bool		has_lockers;
@@ -6262,6 +6263,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * OldestXmin/OldestMxact, so later values never need to be tracked here.)
 	 */
 	need_replace = false;
+	cheap_to_replace = true;
 	FreezePageRelfrozenXid = pagefrz->FreezePageRelfrozenXid;
 	for (int i = 0; i < nmembers; i++)
 	{
@@ -6275,6 +6277,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			need_replace = true;
 			break;
 		}
+		if (TransactionIdFollowsOrEquals(xid, cutoffs->OldestXmin))
+			cheap_to_replace = false;
 		if (TransactionIdPrecedes(xid, FreezePageRelfrozenXid))
 			FreezePageRelfrozenXid = xid;
 	}
@@ -6283,7 +6287,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	if (!need_replace)
 		need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
 
-	if (!need_replace)
+	/*
+	 * FRM_NOOP isn't cheaper than a second pass that indicates that xmax gets
+	 * FRM_INVALIDATE_XMAX processing.  Avoid FRM_NOOP whenever we've already
+	 * determined that doing a second pass will be cheaper (and more useful).
+	 */
+	if (!need_replace && !cheap_to_replace)
 	{
 		/*
 		 * vacuumlazy.c might ratchet back NewRelminMxid, NewRelfrozenXid, or
@@ -6424,6 +6433,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * interesting, because those are longer-lived.)
 		 */
 		Assert(nnewmembers == 1);
+		Assert(need_replace);
 		*flags |= FRM_RETURN_IS_XID;
 		if (update_committed)
 			*flags |= FRM_MARK_COMMITTED;
@@ -6435,6 +6445,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * Create a new multixact with the surviving members of the previous
 		 * one, to set as new Xmax in the tuple
 		 */
+		Assert(need_replace);
 		newxmax = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
 		*flags |= FRM_RETURN_IS_MULTI;
 	}
-- 
2.38.1

In reply to: Peter Geoghegan (#10)
Re: Avoiding unnecessary clog lookups while freezing

On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote:

I could just move the same tests from heap_prepare_freeze_tuple() to
heap_freeze_execute_prepared(), without changing any of the details.

That might work, yes.

Attached patch shows how that could work.

My plan is to push something like this next week, barring objections.
Note that I've inverted the xmax "!TransactionIdDidCommit()" test --
it is now "TransactionIdDidAbort()" instead. I believe that this makes
the test more likely to catch problems, since we really should expect
relevant xmax XIDs to have aborted, specifically -- since the xmax
XIDs in question are always < OldestXmin. (There is a need to use a
"!TransactionIdDidCommit()" test specifically in nearby code in
FreezeMultiXactId(), because that code has to also deal with
in-progress XIDs that are multi members, but that's not the case
here.)

I'm also going to create a CF entry for the other patch posted to this
thread -- the enhancement to FreezeMultiXactId() that aims to get the
most out of any expensive calls to GetMultiXactIdMembers(). That
approach seems quite promising, and relatively simple. I'm not
particularly worried about wasting a call to GetMultiXactIdMembers()
during VACUUM, though. I'm more concerned about the actual impact of
not doing our best to outright remove Multis during VACUUM. The
application itself can experience big performance cliffs from SLRU
buffer misses, which VACUUM should do its utmost to prevent. That is
an occasional source of big problems [1]https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/ -- Peter Geoghegan.

I'm particularly concerned about the possibility that having an
updater XID will totally change the characteristics of how multis are
processed inside FreezeMultiXactId(). That seems like it might be a
really sharp edge. I believe that the page-level freezing patch has
already ameliorated the problem, since it made us much less reliant on
the case where GetMultiXactIdMembers() returns "nmembers <= 0" for a
Multi that happens to be HEAP_XMAX_IS_LOCKED_ONLY(). But we can and
should go further than that.

[1]: https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/ -- Peter Geoghegan
--
Peter Geoghegan