SUBTRANS: Minimizing calls to SubTransSetParent()
On Thu, 4 Aug 2022 at 13:11, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 3 Aug 2022 at 20:18, Andres Freund <andres@anarazel.de> wrote:
I think we should consider redesigning subtrans more substantially - even with
the changes you propose here, there's still plenty ways to hit really bad
performance. And there's only so much we can do about that without more
fundamental design changes.I completely agree - you will be glad to hear that I've been working
on a redesign of the subtrans module.
...
I will post my patch, when complete, in a different thread.
The attached patch reduces the overhead of SUBTRANS by minimizing the
number of times SubTransSetParent() is called, to below 1% of the
current rate in common cases.
Instead of blindly calling SubTransSetParent() for every subxid, this
proposal only calls SubTransSetParent() when that information will be
required for later use. It does this by analyzing all of the callers
of SubTransGetParent() and uses these pre-conditions to filter out
calls/subxids that will never be required, for various reasons. It
redesigns the way XactLockTableWait() calls
SubTransGetTopmostTransactionId() to allow this.
This short patchset compiles and passes make check-world, with lengthy comments.
This might then make viable a simple rewrite of SUBTRANS using a hash
table, as proposed by Andres. But in any case, it will allow us to
design a densely packed SUBTRANS replacement that does not generate as
much contention and I/O.
NOTE that this patchset does not touch SUBTRANS at all, it just
minimizes the calls in preparation for a later redesign in a later
patch. If this patch/later versions of it is committed in Sept CF,
then we should be in good shape to post a subtrans redesign patch by
major patch deadline at end of year.
Patches 001 and 002 are common elements of a different patch,
"Smoothing the subtrans performance catastrophe", but other than that,
the two patches are otherwise independent of each other.
Where does this come from? I learnt a lot about subxids when coding
Hot Standby, specifically commit 06da3c570f21394003. This patch just
builds upon that earlier understanding.
Comments please.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
003_minimize_calls_to_SubTransSetParent.v7.patchapplication/octet-stream; name=003_minimize_calls_to_SubTransSetParent.v7.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..5577e5b8df 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,51 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetStatus will use a status of SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(GetTopTransactionId(), subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons, (1) so it is faster in a long chain of subxids
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * This has the downside that anyone waiting for a lock on aborted
+ * subtransactions would not be released immediately; that may or
+ * may not be an acceptable compromise. If not acceptable, this
+ * simple call needs to be replaced with a loop to register the
+ * parent for the current subxid stack, so we can walk back up it to
+ * the topxid.
+ */
+ SubTransSetParent(subxid, GetTopTransactionId());
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a9ad40e935..f7c1061a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1440,6 +1447,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1508,6 +1517,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1591,7 +1609,11 @@ TransactionIdIsInProgress(TransactionId xid)
for (int i = 0; i < nxids; i++)
{
if (TransactionIdEquals(xids[i], topxid))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
}
}
@@ -1599,6 +1621,28 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(*pxid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1543da6162..b13a8d20fd 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,17 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid) &&
+ TransactionIdIsValid(pxid))
+ xid = pxid;
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
if (oper != XLTW_None)
@@ -745,6 +764,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +783,13 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid) &&
+ TransactionIdIsValid(pxid))
+ xid = pxid;
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1b2cfac5ad..d7ad1da6a8 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
002_new_isolation_tests_for_subxids.v2.patchapplication/octet-stream; name=002_new_isolation_tests_for_subxids.v2.patchDownload
commit cdda7d36c59e20583f69130d2d05f20f6879d12f
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Date: Mon Aug 8 12:15:01 2022 +0100
New iso tests for subxid-overflow
diff --git a/src/test/isolation/expected/subxid-overflow.out b/src/test/isolation/expected/subxid-overflow.out
new file mode 100644
index 0000000000..f6c96f9a34
--- /dev/null
+++ b/src/test/isolation/expected/subxid-overflow.out
@@ -0,0 +1,17 @@
+Parsed test spec with 2 sessions
+
+starting permutation: subxid xmax s2sel s1c
+step subxid: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step xmax: BEGIN; INSERT INTO subxids VALUES (1); COMMIT;
+step s2sel: SELECT count(*) FROM subxids WHERE subx = 0;
+count
+-----
+ 0
+(1 row)
+
+step s1c: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 529a4cbd4d..7e10cfe022 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,3 +1,4 @@
+test: subxid-overflow
test: read-only-anomaly
test: read-only-anomaly-2
test: read-only-anomaly-3
diff --git a/src/test/isolation/specs/subxid-overflow.spec b/src/test/isolation/specs/subxid-overflow.spec
new file mode 100644
index 0000000000..fda9f449f1
--- /dev/null
+++ b/src/test/isolation/specs/subxid-overflow.spec
@@ -0,0 +1,46 @@
+# Subtransaction overflow
+#
+# This test is designed to cover some code paths which only occur when
+# one transaction has overflowed the subtransaction cache.
+
+setup
+{
+DROP TABLE IF EXISTS subxids;
+CREATE TABLE subxids (subx integer);
+
+CREATE OR REPLACE FUNCTION gen_subxids (n integer)
+ RETURNS VOID
+ LANGUAGE plpgsql
+AS $$
+BEGIN
+ IF n <= 0 THEN
+ INSERT INTO subxids VALUES (0);
+ RETURN;
+ ELSE
+ PERFORM gen_subxids(n - 1);
+ RETURN;
+ END IF;
+EXCEPTION /* generates a subxid */
+ WHEN raise_exception THEN NULL;
+END;
+$$;
+}
+
+teardown
+{
+ DROP TABLE subxids;
+ DROP FUNCTION gen_subxids(integer);
+}
+
+session s1
+step subxid { BEGIN; SELECT gen_subxids(100); }
+step s1c { COMMIT; }
+
+session s2
+# move xmax forwards
+step xmax { BEGIN; INSERT INTO subxids VALUES (1); COMMIT;}
+# will see step subxid as still running
+step s2sel { SELECT count(*) FROM subxids WHERE subx = 0; }
+
+# s2sel will see subxid as still running
+permutation subxid xmax s2sel s1c
001_subx_include_subxids_even_if_overflowed.v2.patchapplication/octet-stream; name=001_subx_include_subxids_even_if_overflowed.v2.patchDownload
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..a9ad40e935 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2370,27 +2370,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5bc2a15160..42130f1fea 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -638,13 +638,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1238,8 +1234,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1490,7 +1488,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1504,11 +1502,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
On Mon, Aug 8, 2022 at 6:41 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Thu, 4 Aug 2022 at 13:11, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 3 Aug 2022 at 20:18, Andres Freund <andres@anarazel.de> wrote:
I think we should consider redesigning subtrans more substantially - even with
the changes you propose here, there's still plenty ways to hit really bad
performance. And there's only so much we can do about that without more
fundamental design changes.I completely agree - you will be glad to hear that I've been working
on a redesign of the subtrans module....
I will post my patch, when complete, in a different thread.
The attached patch reduces the overhead of SUBTRANS by minimizing the
number of times SubTransSetParent() is called, to below 1% of the
current rate in common cases.Instead of blindly calling SubTransSetParent() for every subxid, this
proposal only calls SubTransSetParent() when that information will be
required for later use. It does this by analyzing all of the callers
of SubTransGetParent() and uses these pre-conditions to filter out
calls/subxids that will never be required, for various reasons. It
redesigns the way XactLockTableWait() calls
SubTransGetTopmostTransactionId() to allow this.This short patchset compiles and passes make check-world, with lengthy comments.
Does this patch set work independently or it has dependency on the
patches on the other thread "Smoothing the subtrans performance
catastrophe"? Because in this patch I see no code where we are
changing anything to control the access of SubTransGetParent() from
SubTransGetTopmostTransactionId()?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, 9 Aug 2022 at 12:39, Dilip Kumar <dilipbalaut@gmail.com> wrote:
This short patchset compiles and passes make check-world, with lengthy comments.
Does this patch set work independently or it has dependency on the
patches on the other thread "Smoothing the subtrans performance
catastrophe"?
Patches 001 and 002 are common elements of a different patch,
"Smoothing the subtrans performance catastrophe", but other than that,
the two patches are otherwise independent of each other.
i.e. there are common elements in both patches
001 puts all subxid data in a snapshot (up to a limit of 64 xids per
topxid), even if one or more xids overflows.
Because in this patch I see no code where we are
changing anything to control the access of SubTransGetParent() from
SubTransGetTopmostTransactionId()?
Those calls are unaffected, i.e. they both still work.
Right now, we register all subxids in subtrans. But not all xids are
subxids, so in fact, subtrans has many "holes" in it, where if you
look up the parent for an xid it will just return
InvalidTransactionId. There is a protection against that causing a
problem because if you call TransactionIdDidCommit/Abort you can get a
WARNING, or if you call SubTransGetTopmostTransaction() you can get an
ERROR, but it is possible if you do a lookup for an inappropriate xid.
i.e. if you call TransactionIdDidCommit() without first calling
TransactionIdIsInProgress() as you are supposed to do.
What this patch does is increase the number of "holes" in subtrans,
reducing the overhead and making the subtrans data structure more
amenable to using a dense structure rather than a sparse structure as
we do now, which then leads to I/O overheads. But in this patch, we
only have holes when we can prove that the subxid's parent will never
be requested.
Specifically, with this patch, running PL/pgSQL with a few
subtransactions in will cause those subxids to be logged in subtrans
about 1% as often as they are now, so greatly reducing the number of
subtrans calls.
Happy to provide more detailed review thoughts, so please keep asking questions.
--
Simon Riggs http://www.EnterpriseDB.com/
On Tue, Aug 9, 2022 at 9:46 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
Those calls are unaffected, i.e. they both still work.
Right now, we register all subxids in subtrans. But not all xids are
subxids, so in fact, subtrans has many "holes" in it, where if you
look up the parent for an xid it will just return
c. There is a protection against that causing a
problem because if you call TransactionIdDidCommit/Abort you can get a
WARNING, or if you call SubTransGetTopmostTransaction() you can get an
ERROR, but it is possible if you do a lookup for an inappropriate xid.
i.e. if you call TransactionIdDidCommit() without first calling
TransactionIdIsInProgress() as you are supposed to do.
IIUC, if SubTransGetParent SubTransGetParent then
SubTransGetTopmostTransaction() loop will break and return the
previousxid. So if we pass any topxid to
SubTransGetTopmostTransaction() it will return back the same xid and
that's fine as next we are going to search in the snapshot->xip array.
But if we are calling this function with the subxid which might be
there in the snapshot->subxip array but if we are first calling
SubTransGetTopmostTransaction() then it will just return the same xid
if the parent is not set for it. And now if we search this in the
snapshot->xip array then we will get the wrong answer?
So I still think some adjustment is required in XidInMVCCSnapdhot()
such that we first search the snapshot->subxip array.
Am I still missing something?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, 10 Aug 2022 at 08:34, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Am I still missing something?
No, you have found a dependency between the patches that I was unaware
of. So there is no bug if you apply both patches.
Thanks for looking.
So I still think some adjustment is required in XidInMVCCSnapdhot()
That is one way to resolve the issue, but not the only one. I can also
change AssignTransactionId() to recursively register parent xids for
all of a subxid's parents.
I will add in a test case and resolve the dependency in my next patch.
Thanks again.
--
Simon Riggs http://www.EnterpriseDB.com/
On Wed, Aug 10, 2022 at 6:31 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
On Wed, 10 Aug 2022 at 08:34, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Am I still missing something?
No, you have found a dependency between the patches that I was unaware
of. So there is no bug if you apply both patches.
Right
So I still think some adjustment is required in XidInMVCCSnapdhot()
That is one way to resolve the issue, but not the only one. I can also
change AssignTransactionId() to recursively register parent xids for
all of a subxid's parents.I will add in a test case and resolve the dependency in my next patch.
Okay, thanks, I will look into the updated patch after you submit that.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Aug 2022 at 06:32, Dilip Kumar <dilipbalaut@gmail.com> wrote:
So I still think some adjustment is required in XidInMVCCSnapdhot()
That is one way to resolve the issue, but not the only one. I can also
change AssignTransactionId() to recursively register parent xids for
all of a subxid's parents.I will add in a test case and resolve the dependency in my next patch.
Okay, thanks, I will look into the updated patch after you submit that.
PFA two patches, replacing earlier work
001_new_isolation_tests_for_subxids.v3.patch
002_minimize_calls_to_SubTransSetParent.v8.patch
001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.
002_minimize_calls_to_SubTransSetParent.v8.patch
Reduces the number of calls to subtrans below 1% for the first 64 subxids,
so overall will substantially reduce subtrans contention on master for the
typical case, as well as smoothing the overflow case.
Some discussion needed on this; there are various options.
This combines the work originally posted here with another patch posted on the
thread "Smoothing the subtrans performance catastrophe".
I will do some performance testing also, but more welcome.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
002_minimize_calls_to_SubTransSetParent.v8.patchapplication/octet-stream; name=002_minimize_calls_to_SubTransSetParent.v8.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 66d3548155..922621b4fc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -177,6 +177,61 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
+/*
+ * SubTransGetTopmostTransactionPrecedes
+ *
+ * Different form of SubTransGetTopmostTransaction() that minimizes the number
+ * of iterations required to get the parent or stop if it is earlier than cutoff.
+ */
+bool
+SubTransGetTopmostTransactionPrecedes(TransactionId *xid, TransactionId cutoff_xid, bool standby)
+{
+ TransactionId parentXid = *xid,
+ previousXid = *xid;
+
+ /* Can't ask about stuff that might not be around anymore */
+ Assert(TransactionIdFollowsOrEquals(cutoff_xid, TransactionXmin));
+ Assert(TransactionIdFollowsOrEquals(*xid, cutoff_xid));
+
+ while (TransactionIdIsValid(parentXid))
+ {
+ previousXid = parentXid;
+
+ /*
+ * Stop as soon as we are earlier than the cutoff. This saves multiple
+ * lookups against subtrans when we have a deeply nested subxid with
+ * a later snapshot with an xmin much higher than TransactionXmin.
+ */
+ if (TransactionIdPrecedes(parentXid, cutoff_xid))
+ {
+ *xid = previousXid;
+ return true;
+ }
+ parentXid = SubTransGetParent(parentXid);
+
+ /*
+ * By convention the parent xid gets allocated first, so should always
+ * precede the child xid. Anything else points to a corrupted data
+ * structure that could lead to an infinite loop, so exit.
+ */
+ if (!TransactionIdPrecedes(parentXid, previousXid))
+ elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
+ previousXid, parentXid);
+
+ /*
+ * subxids always point directly at parent on standby, so we can avoid
+ * multiple loops in that case.
+ */
+ if (standby)
+ break;
+ }
+
+ Assert(TransactionIdIsValid(previousXid));
+
+ *xid = previousXid;
+
+ return false;
+}
/*
* Initialization of shared memory for SUBTRANS
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..5577e5b8df 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,51 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetStatus will use a status of SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(GetTopTransactionId(), subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons, (1) so it is faster in a long chain of subxids
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * This has the downside that anyone waiting for a lock on aborted
+ * subtransactions would not be released immediately; that may or
+ * may not be an acceptable compromise. If not acceptable, this
+ * simple call needs to be replaced with a loop to register the
+ * parent for the current subxid stack, so we can walk back up it to
+ * the topxid.
+ */
+ SubTransSetParent(subxid, GetTopTransactionId());
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..0067fa327f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1440,6 +1447,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1508,6 +1517,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1591,7 +1609,11 @@ TransactionIdIsInProgress(TransactionId xid)
for (int i = 0; i < nxids; i++)
{
if (TransactionIdEquals(xids[i], topxid))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
}
}
@@ -1599,6 +1621,28 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(LastCallXidIsInProgressParentXid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
@@ -2370,27 +2414,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1543da6162..3cb607c285 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,27 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test3 */
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ {
+ /*
+ * We can get here if RecoveryInProgress() during the last call to
+ * TransactionIdIsInProgress(), but we don't Assert that here since
+ * that would create a race window against the end of recovery.
+ */
+ xid = SubTransGetTopmostTransaction(xid);
+ }
}
if (oper != XLTW_None)
@@ -745,6 +774,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +793,15 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 9b504c9745..ae9dc0dd92 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -639,13 +639,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1239,8 +1235,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1491,7 +1489,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1505,11 +1503,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
@@ -2285,6 +2278,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
+ bool have_parent = false;
+
/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
@@ -2300,80 +2295,66 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
+ /*
+ * This patch reorders the operations in XidMVCCSnapshot, so as to reduce
+ * calls to SubTransGetParent to the absolute minimum needed.
+ * The previous code was neat, but not efficient for the overflow case.
+ */
+retry_search:
+
/*
* Snapshot information is stored slightly differently in snapshots taken
- * during recovery.
+ * during recovery. xip is empty on standbys.
*/
if (!snapshot->takenDuringRecovery)
{
- /*
- * If the snapshot contains full subxact data, the fastest way to
- * check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
- */
- if (!snapshot->suboverflowed)
- {
- /* we have full data, so search subxip */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
-
- /* not there, fall through to search xip[] */
- }
- else
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
-
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
-
if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
return true;
}
- else
+
+ /*
+ * If we have the parent xid, then the xid is not in snapshot
+ */
+ if (have_parent)
+ return false;
+
+ /*
+ * Now search subxip, which contains first 64 subxids of each topxid.
+ */
+ if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
+ return true;
+
+ if (!have_parent && snapshot->suboverflowed)
{
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
+
/*
- * In recovery we store all xids in the subxact array because it is by
- * far the bigger array, and we mostly don't know which xids are
- * top-level and which are subxacts. The xip array is empty.
+ * If we haven't found xid yet, it might be because it is a subxid
+ * that is not present because we overflowed, but it might also be
+ * because the xid is not in the snapshot.
*
- * We start by searching subtrans, if we overflowed.
+ * It is important we do this step last because it is expensive,
+ * and if everybody does this then SubTransSLRU glows white hot.
+ *
+ * Use SubTransGetTopmostTransactionPrecedes(), which has been
+ * specially provided to help here. This does two things for us:
+ *
+ * 1. On standby, get the parent directly, since in a standby SUBTRANS
+ * always stores the direct parent only. Doing this avoids
+ * one lookup of subtrans, since SubTransGetTopmostTransaction()
+ * always does at least 2 SUBTRANS lookups, the last lookup is
+ * how the loop knows it has found the parent in normal running.
+ *
+ * 2. Stops the iteration to find the parent as soon as we find an
+ * xid earlier than snapshot->xmin, so we do the minimum lookups.
*/
- if (snapshot->suboverflowed)
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
+ if (SubTransGetTopmostTransactionPrecedes(&xid,
+ snapshot->xmin,
+ snapshot->takenDuringRecovery))
+ return false;
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
-
- /*
- * We now have either a top-level xid higher than xmin or an
- * indeterminate xid. We don't know whether it's top level or subxact
- * but it doesn't matter. If it's present, the xid is visible.
- */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
}
return false;
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index f94e116640..fc4103b769 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -17,6 +17,7 @@
extern void SubTransSetParent(TransactionId xid, TransactionId parent);
extern TransactionId SubTransGetParent(TransactionId xid);
extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
+extern bool SubTransGetTopmostTransactionPrecedes(TransactionId *xid, TransactionId cutoff_xid, bool standby);
extern Size SUBTRANSShmemSize(void);
extern void SUBTRANSShmemInit(void);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1b2cfac5ad..d7ad1da6a8 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
001_new_isolation_tests_for_subxids.v3.patchapplication/octet-stream; name=001_new_isolation_tests_for_subxids.v3.patchDownload
diff --git a/src/test/isolation/expected/subxid-overflow.out b/src/test/isolation/expected/subxid-overflow.out
new file mode 100644
index 0000000000..9b0396f451
--- /dev/null
+++ b/src/test/isolation/expected/subxid-overflow.out
@@ -0,0 +1,82 @@
+Parsed test spec with 3 sessions
+
+starting permutation: ins subxov xmax s2sel s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2sel: SELECT val FROM subxids WHERE subx = 0;
+val
+---
+ 0
+(1 row)
+
+step s1c: COMMIT;
+
+starting permutation: ins subxov sub3 xmax s2brr s2s3 s3c s2s3 s2c s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step sub3: BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0);
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s3c: COMMIT;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s2c: COMMIT;
+step s1c: COMMIT;
+
+starting permutation: ins subxov sub3 xmax s2brc s2s3 s3c s2s3 s2c s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step sub3: BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0);
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s3c: COMMIT;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+ 0
+(1 row)
+
+step s2c: COMMIT;
+step s1c: COMMIT;
+
+starting permutation: ins subxov xmax s2upd s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2upd: UPDATE subxids SET val = 1 WHERE subx = 0; <waiting ...>
+step s1c: COMMIT;
+step s2upd: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 529a4cbd4d..7e10cfe022 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,3 +1,4 @@
+test: subxid-overflow
test: read-only-anomaly
test: read-only-anomaly-2
test: read-only-anomaly-3
diff --git a/src/test/isolation/specs/subxid-overflow.spec b/src/test/isolation/specs/subxid-overflow.spec
new file mode 100644
index 0000000000..9a69db45e8
--- /dev/null
+++ b/src/test/isolation/specs/subxid-overflow.spec
@@ -0,0 +1,79 @@
+# Subtransaction overflow
+#
+# This test is designed to cover some code paths which only occur when
+# one transaction has overflowed the subtransaction cache.
+
+setup
+{
+DROP TABLE IF EXISTS subxids;
+CREATE TABLE subxids (subx integer, val integer);
+
+CREATE OR REPLACE FUNCTION gen_subxids (n integer)
+ RETURNS VOID
+ LANGUAGE plpgsql
+AS $$
+BEGIN
+ IF n <= 0 THEN
+ UPDATE subxids SET val = 1 WHERE subx = 0;
+ RETURN;
+ ELSE
+ PERFORM gen_subxids(n - 1);
+ RETURN;
+ END IF;
+EXCEPTION /* generates a subxid */
+ WHEN raise_exception THEN NULL;
+END;
+$$;
+}
+
+teardown
+{
+ DROP TABLE subxids;
+ DROP FUNCTION gen_subxids(integer);
+}
+
+session s1
+# setup step for each test
+step ins { TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0); }
+# long running transaction with overflowed subxids
+step subxov { BEGIN; SELECT gen_subxids(100); }
+# commit should always come last to make this long running
+step s1c { COMMIT; }
+
+session s2
+# move xmax forwards
+step xmax { BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;}
+
+# step for test1
+step s2sel { SELECT val FROM subxids WHERE subx = 0; }
+
+# steps for test2
+step s2brr { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc { BEGIN ISOLATION LEVEL READ COMMITTED; }
+# look for data written by sub3
+step s2s3 { SELECT val FROM subxids WHERE subx = 1; }
+step s2c { COMMIT; }
+
+# step for test3
+step s2upd { UPDATE subxids SET val = 1 WHERE subx = 0; }
+
+session s3
+# transaction with subxids that can commit before s1c
+step sub3 { BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0); }
+step s3c { COMMIT; }
+
+# test1
+# s2sel will see subxid as still running
+# designed to test XidInMVCCSnapshot() when overflows, xid is found
+permutation ins subxov xmax s2sel s1c
+
+# test2
+# designed to test XidInMVCCSnapshot() when overflows, xid is not found
+# both SELECTs invisible
+permutation ins subxov sub3 xmax s2brr s2s3 s3c s2s3 s2c s1c
+# 2nd SELECT visible after commit
+permutation ins subxov sub3 xmax s2brc s2s3 s3c s2s3 s2c s1c
+
+# test3
+# designed to test XactLockTableWait() for overflows
+permutation ins subxov xmax s2upd s1c
On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
PFA two patches, replacing earlier work
001_new_isolation_tests_for_subxids.v3.patch
002_minimize_calls_to_SubTransSetParent.v8.patch001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.002_minimize_calls_to_SubTransSetParent.v8.patch
Reduces the number of calls to subtrans below 1% for the first 64 subxids,
so overall will substantially reduce subtrans contention on master for the
typical case, as well as smoothing the overflow case.
Some discussion needed on this; there are various options.
This combines the work originally posted here with another patch posted on the
thread "Smoothing the subtrans performance catastrophe".I will do some performance testing also, but more welcome.
Thanks for the updated patch, I have some questions/comments.
1.
+ * This has the downside that anyone waiting for a lock on aborted
+ * subtransactions would not be released immediately; that may or
+ * may not be an acceptable compromise. If not acceptable, this
+ * simple call needs to be replaced with a loop to register the
+ * parent for the current subxid stack, so we can walk
back up it to
+ * the topxid.
+ */
+ SubTransSetParent(subxid, GetTopTransactionId());
I do not understand in which situation we will see this downside. I
mean if we see the logic of XactLockTableWait() then in the current
situation also if the subtransaction is committed we directly wait on
the top transaction by calling SubTransGetTopmostTransaction(xid);
So if the lock-taking subtransaction is committed then we will wait
directly for the top-level transaction and after that, it doesn't
matter if we abort any of the parent subtransactions, because it will
wait for the topmost transaction to complete. And if the lock-taking
subtransaction is aborted then it will anyway stop waiting because
TransactionIdIsInProgress() should return false.
2.
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
* the parent xid. This is a difference between normal processing and
* recovery, yet is still correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
* have attempted to SubTransSetParent().
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
I think this comment needs some modification because in this patch now
in normal processing also we are setting the topxid as a parent right?
3.
+ while (TransactionIdIsValid(parentXid))
+ {
+ previousXid = parentXid;
+
+ /*
+ * Stop as soon as we are earlier than the cutoff. This saves multiple
+ * lookups against subtrans when we have a deeply nested subxid with
+ * a later snapshot with an xmin much higher than TransactionXmin.
+ */
+ if (TransactionIdPrecedes(parentXid, cutoff_xid))
+ {
+ *xid = previousXid;
+ return true;
+ }
+ parentXid = SubTransGetParent(parentXid);
Do we need this while loop if we are directly setting topxid as a
parent, so with that, we do not need multiple iterations to go to the
top xid?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, 6 Sept 2022 at 12:37, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:PFA two patches, replacing earlier work
001_new_isolation_tests_for_subxids.v3.patch
002_minimize_calls_to_SubTransSetParent.v8.patch001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.002_minimize_calls_to_SubTransSetParent.v8.patch
Reduces the number of calls to subtrans below 1% for the first 64 subxids,
so overall will substantially reduce subtrans contention on master for the
typical case, as well as smoothing the overflow case.
Some discussion needed on this; there are various options.
This combines the work originally posted here with another patch posted on the
thread "Smoothing the subtrans performance catastrophe".I will do some performance testing also, but more welcome.
Thanks for the updated patch, I have some questions/comments.
Thanks for the review.
1. + * This has the downside that anyone waiting for a lock on aborted + * subtransactions would not be released immediately; that may or + * may not be an acceptable compromise. If not acceptable, this + * simple call needs to be replaced with a loop to register the + * parent for the current subxid stack, so we can walk back up it to + * the topxid. + */ + SubTransSetParent(subxid, GetTopTransactionId());I do not understand in which situation we will see this downside. I
mean if we see the logic of XactLockTableWait() then in the current
situation also if the subtransaction is committed we directly wait on
the top transaction by calling SubTransGetTopmostTransaction(xid);So if the lock-taking subtransaction is committed then we will wait
directly for the top-level transaction and after that, it doesn't
matter if we abort any of the parent subtransactions, because it will
wait for the topmost transaction to complete. And if the lock-taking
subtransaction is aborted then it will anyway stop waiting because
TransactionIdIsInProgress() should return false.
Yes, correct.
2.
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
* the parent xid. This is a difference between normal processing and
* recovery, yet is still correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
* have attempted to SubTransSetParent().
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);I think this comment needs some modification because in this patch now
in normal processing also we are setting the topxid as a parent right?
Correct
3. + while (TransactionIdIsValid(parentXid)) + { + previousXid = parentXid; + + /* + * Stop as soon as we are earlier than the cutoff. This saves multiple + * lookups against subtrans when we have a deeply nested subxid with + * a later snapshot with an xmin much higher than TransactionXmin. + */ + if (TransactionIdPrecedes(parentXid, cutoff_xid)) + { + *xid = previousXid; + return true; + } + parentXid = SubTransGetParent(parentXid);Do we need this while loop if we are directly setting topxid as a
parent, so with that, we do not need multiple iterations to go to the
top xid?
Correct. I think we can dispense with
SubTransGetTopmostTransactionPrecedes() entirely.
I was initially trying to leave options open but that is confusing and
as a result, some parts are misleading after I merged the two patches.
I will update the patch, thanks for your scrutiny.
--
Simon Riggs http://www.EnterpriseDB.com/
On Tue, 6 Sept 2022 at 13:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
I will update the patch, thanks for your scrutiny.
I attach a diff showing what has changed between v8 and v9, and will
reattach a full set of new patches in the next post, so patchtester
doesn't squeal.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
subx.v8--v9.diffapplication/octet-stream; name=subx.v8--v9.diffDownload
commit a3705442ab0c8f66f600add6181505dac6c1ebf4
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Date: Tue Sep 13 11:32:32 2022 +0100
Streamline patches and comments
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 922621b4fc..140ad78530 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -137,14 +137,8 @@ SubTransGetParent(TransactionId xid)
/*
* SubTransGetTopmostTransaction
*
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID. This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
+ * Returns the topmost transaction of the given transaction id, if one has
+ * been recorded in pg_subtrans.
*/
TransactionId
SubTransGetTopmostTransaction(TransactionId xid)
@@ -177,62 +171,6 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
-/*
- * SubTransGetTopmostTransactionPrecedes
- *
- * Different form of SubTransGetTopmostTransaction() that minimizes the number
- * of iterations required to get the parent or stop if it is earlier than cutoff.
- */
-bool
-SubTransGetTopmostTransactionPrecedes(TransactionId *xid, TransactionId cutoff_xid, bool standby)
-{
- TransactionId parentXid = *xid,
- previousXid = *xid;
-
- /* Can't ask about stuff that might not be around anymore */
- Assert(TransactionIdFollowsOrEquals(cutoff_xid, TransactionXmin));
- Assert(TransactionIdFollowsOrEquals(*xid, cutoff_xid));
-
- while (TransactionIdIsValid(parentXid))
- {
- previousXid = parentXid;
-
- /*
- * Stop as soon as we are earlier than the cutoff. This saves multiple
- * lookups against subtrans when we have a deeply nested subxid with
- * a later snapshot with an xmin much higher than TransactionXmin.
- */
- if (TransactionIdPrecedes(parentXid, cutoff_xid))
- {
- *xid = previousXid;
- return true;
- }
- parentXid = SubTransGetParent(parentXid);
-
- /*
- * By convention the parent xid gets allocated first, so should always
- * precede the child xid. Anything else points to a corrupted data
- * structure that could lead to an infinite loop, so exit.
- */
- if (!TransactionIdPrecedes(parentXid, previousXid))
- elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
- previousXid, parentXid);
-
- /*
- * subxids always point directly at parent on standby, so we can avoid
- * multiple loops in that case.
- */
- if (standby)
- break;
- }
-
- Assert(TransactionIdIsValid(previousXid));
-
- *xid = previousXid;
-
- return false;
-}
-
/*
* Initialization of shared memory for SUBTRANS
*/
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5577e5b8df..04e597b5ec 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -713,7 +713,7 @@ AssignTransactionId(TransactionState s)
* 2. When IsolationIsSerializable() we sometimes need to access topxid
* This occurs only when SERIALIZABLE is requested by app user.
*
- * 3. When TransactionIdSetStatus will use a status of SUB_COMMITTED,
+ * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED,
* which then requires us to consult subtrans to find parent, which
* is needed to avoid race condition. In this case we ask Clog/Xact
* module if TransactionIdsAreOnSameXactPage(). Since we start a new
@@ -726,14 +726,14 @@ AssignTransactionId(TransactionState s)
/*
* Insert entries into subtrans for this xid, noting that the entry
* points directly to the topxid, not the immediate parent. This is
- * done for two reasons, (1) so it is faster in a long chain of subxids
+ * done for two reasons:
+ * (1) so it is faster in a long chain of subxids, because the
+ * algorithm is then O(1), no matter how many subxids are assigned.
* (2) so that we don't need to set subxids for unregistered parents.
- * This has the downside that anyone waiting for a lock on aborted
- * subtransactions would not be released immediately; that may or
- * may not be an acceptable compromise. If not acceptable, this
- * simple call needs to be replaced with a loop to register the
- * parent for the current subxid stack, so we can walk back up it to
- * the topxid.
+ * Previously when we set the parent to be the immediate parent,
+ * we then had to set the parent in all cases to maintain the chain
+ * of values to reach the topxid. If all subxids point to topxid,
+ * then they are independent of each other and we can skip some.
*/
SubTransSetParent(subxid, GetTopTransactionId());
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0067fa327f..dae3832e87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1316,14 +1316,14 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
- * the parent xid. This is a difference between normal processing and
- * recovery, yet is still correct in all cases. The reason is that
+ * the parent xid, which is correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
- * have attempted to SubTransSetParent().
+ * have attempted to SubTransSetParent() for this xid in recovery.
+ * XXX this may be optimized later, but we don't have good test coverage.
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ae9dc0dd92..e50a640231 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2310,13 +2310,13 @@ retry_search:
{
if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
return true;
- }
- /*
- * If we have the parent xid, then the xid is not in snapshot
- */
- if (have_parent)
- return false;
+ /*
+ * If we have the parent xid, then the xid is not in snapshot
+ */
+ if (have_parent)
+ return false;
+ }
/*
* Now search subxip, which contains first 64 subxids of each topxid.
@@ -2326,6 +2326,8 @@ retry_search:
if (!have_parent && snapshot->suboverflowed)
{
+ TransactionId pxid;
+
/* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
/*
@@ -2335,26 +2337,34 @@ retry_search:
*
* It is important we do this step last because it is expensive,
* and if everybody does this then SubTransSLRU glows white hot.
- *
- * Use SubTransGetTopmostTransactionPrecedes(), which has been
- * specially provided to help here. This does two things for us:
- *
- * 1. On standby, get the parent directly, since in a standby SUBTRANS
- * always stores the direct parent only. Doing this avoids
- * one lookup of subtrans, since SubTransGetTopmostTransaction()
- * always does at least 2 SUBTRANS lookups, the last lookup is
- * how the loop knows it has found the parent in normal running.
- *
- * 2. Stops the iteration to find the parent as soon as we find an
- * xid earlier than snapshot->xmin, so we do the minimum lookups.
*/
- if (SubTransGetTopmostTransactionPrecedes(&xid,
- snapshot->xmin,
- snapshot->takenDuringRecovery))
+
+ /*
+ * Snapshot overflowed, so convert xid to top-level. This is safe
+ * because we eliminated too-old XIDs above. Every overflowed subxid
+ * will always have a parent recorded during AssignTransactionId()
+ * so this should always return a valid TransactionId, if a subxact.
+ */
+ pxid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * If xid was indeed a subxact, we might now have an xid < xmin,
+ * so recheck to avoid an array scan. No point in rechecking
+ * xmax. If it wasn't a subxact, pxid will be invalid, so this test
+ * will do the right thing also.
+ */
+ if (TransactionIdPrecedes(pxid, snapshot->xmin))
return false;
- have_parent = true;
- goto retry_search; /* search arrays again, now we have parent */
+ /*
+ * Check whether xid was a subxact. If we now have a parent xid, loop.
+ */
+ if (TransactionIdPrecedes(pxid, xid))
+ {
+ xid = pxid;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
+ }
}
return false;
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index fc4103b769..f94e116640 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -17,7 +17,6 @@
extern void SubTransSetParent(TransactionId xid, TransactionId parent);
extern TransactionId SubTransGetParent(TransactionId xid);
extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
-extern bool SubTransGetTopmostTransactionPrecedes(TransactionId *xid, TransactionId cutoff_xid, bool standby);
extern Size SUBTRANSShmemSize(void);
extern void SUBTRANSShmemInit(void);
On Tue, 13 Sept 2022 at 11:56, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Tue, 6 Sept 2022 at 13:14, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
I will update the patch, thanks for your scrutiny.
I attach a diff showing what has changed between v8 and v9, and will
reattach a full set of new patches in the next post, so patchtester
doesn't squeal.
Full set of v9 patches
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
001_new_isolation_tests_for_subxids.v3.patchapplication/octet-stream; name=001_new_isolation_tests_for_subxids.v3.patchDownload
diff --git a/src/test/isolation/expected/subxid-overflow.out b/src/test/isolation/expected/subxid-overflow.out
new file mode 100644
index 0000000000..9b0396f451
--- /dev/null
+++ b/src/test/isolation/expected/subxid-overflow.out
@@ -0,0 +1,82 @@
+Parsed test spec with 3 sessions
+
+starting permutation: ins subxov xmax s2sel s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2sel: SELECT val FROM subxids WHERE subx = 0;
+val
+---
+ 0
+(1 row)
+
+step s1c: COMMIT;
+
+starting permutation: ins subxov sub3 xmax s2brr s2s3 s3c s2s3 s2c s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step sub3: BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0);
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s3c: COMMIT;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s2c: COMMIT;
+step s1c: COMMIT;
+
+starting permutation: ins subxov sub3 xmax s2brc s2s3 s3c s2s3 s2c s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step sub3: BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0);
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+(0 rows)
+
+step s3c: COMMIT;
+step s2s3: SELECT val FROM subxids WHERE subx = 1;
+val
+---
+ 0
+(1 row)
+
+step s2c: COMMIT;
+step s1c: COMMIT;
+
+starting permutation: ins subxov xmax s2upd s1c
+step ins: TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0);
+step subxov: BEGIN; SELECT gen_subxids(100);
+gen_subxids
+-----------
+
+(1 row)
+
+step xmax: BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;
+step s2upd: UPDATE subxids SET val = 1 WHERE subx = 0; <waiting ...>
+step s1c: COMMIT;
+step s2upd: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 529a4cbd4d..7e10cfe022 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -1,3 +1,4 @@
+test: subxid-overflow
test: read-only-anomaly
test: read-only-anomaly-2
test: read-only-anomaly-3
diff --git a/src/test/isolation/specs/subxid-overflow.spec b/src/test/isolation/specs/subxid-overflow.spec
new file mode 100644
index 0000000000..9a69db45e8
--- /dev/null
+++ b/src/test/isolation/specs/subxid-overflow.spec
@@ -0,0 +1,79 @@
+# Subtransaction overflow
+#
+# This test is designed to cover some code paths which only occur when
+# one transaction has overflowed the subtransaction cache.
+
+setup
+{
+DROP TABLE IF EXISTS subxids;
+CREATE TABLE subxids (subx integer, val integer);
+
+CREATE OR REPLACE FUNCTION gen_subxids (n integer)
+ RETURNS VOID
+ LANGUAGE plpgsql
+AS $$
+BEGIN
+ IF n <= 0 THEN
+ UPDATE subxids SET val = 1 WHERE subx = 0;
+ RETURN;
+ ELSE
+ PERFORM gen_subxids(n - 1);
+ RETURN;
+ END IF;
+EXCEPTION /* generates a subxid */
+ WHEN raise_exception THEN NULL;
+END;
+$$;
+}
+
+teardown
+{
+ DROP TABLE subxids;
+ DROP FUNCTION gen_subxids(integer);
+}
+
+session s1
+# setup step for each test
+step ins { TRUNCATE subxids; INSERT INTO subxids VALUES (0, 0); }
+# long running transaction with overflowed subxids
+step subxov { BEGIN; SELECT gen_subxids(100); }
+# commit should always come last to make this long running
+step s1c { COMMIT; }
+
+session s2
+# move xmax forwards
+step xmax { BEGIN; INSERT INTO subxids VALUES (99, 0); COMMIT;}
+
+# step for test1
+step s2sel { SELECT val FROM subxids WHERE subx = 0; }
+
+# steps for test2
+step s2brr { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc { BEGIN ISOLATION LEVEL READ COMMITTED; }
+# look for data written by sub3
+step s2s3 { SELECT val FROM subxids WHERE subx = 1; }
+step s2c { COMMIT; }
+
+# step for test3
+step s2upd { UPDATE subxids SET val = 1 WHERE subx = 0; }
+
+session s3
+# transaction with subxids that can commit before s1c
+step sub3 { BEGIN; SAVEPOINT s; INSERT INTO subxids VALUES (1, 0); }
+step s3c { COMMIT; }
+
+# test1
+# s2sel will see subxid as still running
+# designed to test XidInMVCCSnapshot() when overflows, xid is found
+permutation ins subxov xmax s2sel s1c
+
+# test2
+# designed to test XidInMVCCSnapshot() when overflows, xid is not found
+# both SELECTs invisible
+permutation ins subxov sub3 xmax s2brr s2s3 s3c s2s3 s2c s1c
+# 2nd SELECT visible after commit
+permutation ins subxov sub3 xmax s2brc s2s3 s3c s2s3 s2c s1c
+
+# test3
+# designed to test XactLockTableWait() for overflows
+permutation ins subxov xmax s2upd s1c
002_minimize_calls_to_SubTransSetParent.v9.patchapplication/octet-stream; name=002_minimize_calls_to_SubTransSetParent.v9.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 66d3548155..140ad78530 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -137,14 +137,8 @@ SubTransGetParent(TransactionId xid)
/*
* SubTransGetTopmostTransaction
*
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID. This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
+ * Returns the topmost transaction of the given transaction id, if one has
+ * been recorded in pg_subtrans.
*/
TransactionId
SubTransGetTopmostTransaction(TransactionId xid)
@@ -177,7 +171,6 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
-
/*
* Initialization of shared memory for SUBTRANS
*/
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..04e597b5ec 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,51 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(GetTopTransactionId(), subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons:
+ * (1) so it is faster in a long chain of subxids, because the
+ * algorithm is then O(1), no matter how many subxids are assigned.
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * Previously when we set the parent to be the immediate parent,
+ * we then had to set the parent in all cases to maintain the chain
+ * of values to reach the topxid. If all subxids point to topxid,
+ * then they are independent of each other and we can skip some.
+ */
+ SubTransSetParent(subxid, GetTopTransactionId());
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..dae3832e87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1309,14 +1316,14 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
- * the parent xid. This is a difference between normal processing and
- * recovery, yet is still correct in all cases. The reason is that
+ * the parent xid, which is correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
- * have attempted to SubTransSetParent().
+ * have attempted to SubTransSetParent() for this xid in recovery.
+ * XXX this may be optimized later, but we don't have good test coverage.
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
@@ -1440,6 +1447,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1508,6 +1517,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1591,7 +1609,11 @@ TransactionIdIsInProgress(TransactionId xid)
for (int i = 0; i < nxids; i++)
{
if (TransactionIdEquals(xids[i], topxid))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
}
}
@@ -1599,6 +1621,28 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(LastCallXidIsInProgressParentXid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
@@ -2370,27 +2414,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1043068bac..3e529e1bc2 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,27 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test3 */
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ {
+ /*
+ * We can get here if RecoveryInProgress() during the last call to
+ * TransactionIdIsInProgress(), but we don't Assert that here since
+ * that would create a race window against the end of recovery.
+ */
+ xid = SubTransGetTopmostTransaction(xid);
+ }
}
if (oper != XLTW_None)
@@ -745,6 +774,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +793,15 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 9b504c9745..e50a640231 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -639,13 +639,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1239,8 +1235,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1491,7 +1489,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1505,11 +1503,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
@@ -2285,6 +2278,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
+ bool have_parent = false;
+
/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
@@ -2300,80 +2295,76 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
+ /*
+ * This patch reorders the operations in XidMVCCSnapshot, so as to reduce
+ * calls to SubTransGetParent to the absolute minimum needed.
+ * The previous code was neat, but not efficient for the overflow case.
+ */
+retry_search:
+
/*
* Snapshot information is stored slightly differently in snapshots taken
- * during recovery.
+ * during recovery. xip is empty on standbys.
*/
if (!snapshot->takenDuringRecovery)
{
+ if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
+ return true;
+
/*
- * If the snapshot contains full subxact data, the fastest way to
- * check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
+ * If we have the parent xid, then the xid is not in snapshot
*/
- if (!snapshot->suboverflowed)
- {
- /* we have full data, so search subxip */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
-
- /* not there, fall through to search xip[] */
- }
- else
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
+ if (have_parent)
+ return false;
+ }
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Now search subxip, which contains first 64 subxids of each topxid.
+ */
+ if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
+ return true;
- if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
- return true;
- }
- else
+ if (!have_parent && snapshot->suboverflowed)
{
+ TransactionId pxid;
+
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
+
/*
- * In recovery we store all xids in the subxact array because it is by
- * far the bigger array, and we mostly don't know which xids are
- * top-level and which are subxacts. The xip array is empty.
+ * If we haven't found xid yet, it might be because it is a subxid
+ * that is not present because we overflowed, but it might also be
+ * because the xid is not in the snapshot.
*
- * We start by searching subtrans, if we overflowed.
+ * It is important we do this step last because it is expensive,
+ * and if everybody does this then SubTransSLRU glows white hot.
*/
- if (snapshot->suboverflowed)
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Snapshot overflowed, so convert xid to top-level. This is safe
+ * because we eliminated too-old XIDs above. Every overflowed subxid
+ * will always have a parent recorded during AssignTransactionId()
+ * so this should always return a valid TransactionId, if a subxact.
+ */
+ pxid = SubTransGetTopmostTransaction(xid);
/*
- * We now have either a top-level xid higher than xmin or an
- * indeterminate xid. We don't know whether it's top level or subxact
- * but it doesn't matter. If it's present, the xid is visible.
+ * If xid was indeed a subxact, we might now have an xid < xmin,
+ * so recheck to avoid an array scan. No point in rechecking
+ * xmax. If it wasn't a subxact, pxid will be invalid, so this test
+ * will do the right thing also.
*/
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
+ if (TransactionIdPrecedes(pxid, snapshot->xmin))
+ return false;
+
+ /*
+ * Check whether xid was a subxact. If we now have a parent xid, loop.
+ */
+ if (TransactionIdPrecedes(pxid, xid))
+ {
+ xid = pxid;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
+ }
}
return false;
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1b2cfac5ad..d7ad1da6a8 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
On 2022-Aug-30, Simon Riggs wrote:
001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.
I gave this a quick run to confirm the claimed increase of coverage. It
checks out, so pushed.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Simon Riggs wrote:
001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.I gave this a quick run to confirm the claimed increase of coverage. It
checks out, so pushed.
Thank you.
So now we just have the main part of the patch, reattached here for
the auto patch tester's benefit.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
002_minimize_calls_to_SubTransSetParent.v9.patchapplication/octet-stream; name=002_minimize_calls_to_SubTransSetParent.v9.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 66d3548155..140ad78530 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -137,14 +137,8 @@ SubTransGetParent(TransactionId xid)
/*
* SubTransGetTopmostTransaction
*
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID. This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
+ * Returns the topmost transaction of the given transaction id, if one has
+ * been recorded in pg_subtrans.
*/
TransactionId
SubTransGetTopmostTransaction(TransactionId xid)
@@ -177,7 +171,6 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
-
/*
* Initialization of shared memory for SUBTRANS
*/
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..04e597b5ec 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,51 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(GetTopTransactionId(), subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons:
+ * (1) so it is faster in a long chain of subxids, because the
+ * algorithm is then O(1), no matter how many subxids are assigned.
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * Previously when we set the parent to be the immediate parent,
+ * we then had to set the parent in all cases to maintain the chain
+ * of values to reach the topxid. If all subxids point to topxid,
+ * then they are independent of each other and we can skip some.
+ */
+ SubTransSetParent(subxid, GetTopTransactionId());
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..dae3832e87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1309,14 +1316,14 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
- * the parent xid. This is a difference between normal processing and
- * recovery, yet is still correct in all cases. The reason is that
+ * the parent xid, which is correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
- * have attempted to SubTransSetParent().
+ * have attempted to SubTransSetParent() for this xid in recovery.
+ * XXX this may be optimized later, but we don't have good test coverage.
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
@@ -1440,6 +1447,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1508,6 +1517,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1591,7 +1609,11 @@ TransactionIdIsInProgress(TransactionId xid)
for (int i = 0; i < nxids; i++)
{
if (TransactionIdEquals(xids[i], topxid))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
}
}
@@ -1599,6 +1621,28 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(LastCallXidIsInProgressParentXid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
@@ -2370,27 +2414,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1043068bac..3e529e1bc2 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,27 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test3 */
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ {
+ /*
+ * We can get here if RecoveryInProgress() during the last call to
+ * TransactionIdIsInProgress(), but we don't Assert that here since
+ * that would create a race window against the end of recovery.
+ */
+ xid = SubTransGetTopmostTransaction(xid);
+ }
}
if (oper != XLTW_None)
@@ -745,6 +774,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +793,15 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 9b504c9745..e50a640231 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -639,13 +639,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1239,8 +1235,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1491,7 +1489,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1505,11 +1503,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
@@ -2285,6 +2278,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
+ bool have_parent = false;
+
/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
@@ -2300,80 +2295,76 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
+ /*
+ * This patch reorders the operations in XidMVCCSnapshot, so as to reduce
+ * calls to SubTransGetParent to the absolute minimum needed.
+ * The previous code was neat, but not efficient for the overflow case.
+ */
+retry_search:
+
/*
* Snapshot information is stored slightly differently in snapshots taken
- * during recovery.
+ * during recovery. xip is empty on standbys.
*/
if (!snapshot->takenDuringRecovery)
{
+ if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
+ return true;
+
/*
- * If the snapshot contains full subxact data, the fastest way to
- * check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
+ * If we have the parent xid, then the xid is not in snapshot
*/
- if (!snapshot->suboverflowed)
- {
- /* we have full data, so search subxip */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
-
- /* not there, fall through to search xip[] */
- }
- else
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
+ if (have_parent)
+ return false;
+ }
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Now search subxip, which contains first 64 subxids of each topxid.
+ */
+ if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
+ return true;
- if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
- return true;
- }
- else
+ if (!have_parent && snapshot->suboverflowed)
{
+ TransactionId pxid;
+
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
+
/*
- * In recovery we store all xids in the subxact array because it is by
- * far the bigger array, and we mostly don't know which xids are
- * top-level and which are subxacts. The xip array is empty.
+ * If we haven't found xid yet, it might be because it is a subxid
+ * that is not present because we overflowed, but it might also be
+ * because the xid is not in the snapshot.
*
- * We start by searching subtrans, if we overflowed.
+ * It is important we do this step last because it is expensive,
+ * and if everybody does this then SubTransSLRU glows white hot.
*/
- if (snapshot->suboverflowed)
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Snapshot overflowed, so convert xid to top-level. This is safe
+ * because we eliminated too-old XIDs above. Every overflowed subxid
+ * will always have a parent recorded during AssignTransactionId()
+ * so this should always return a valid TransactionId, if a subxact.
+ */
+ pxid = SubTransGetTopmostTransaction(xid);
/*
- * We now have either a top-level xid higher than xmin or an
- * indeterminate xid. We don't know whether it's top level or subxact
- * but it doesn't matter. If it's present, the xid is visible.
+ * If xid was indeed a subxact, we might now have an xid < xmin,
+ * so recheck to avoid an array scan. No point in rechecking
+ * xmax. If it wasn't a subxact, pxid will be invalid, so this test
+ * will do the right thing also.
*/
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
+ if (TransactionIdPrecedes(pxid, snapshot->xmin))
+ return false;
+
+ /*
+ * Check whether xid was a subxact. If we now have a parent xid, loop.
+ */
+ if (TransactionIdPrecedes(pxid, xid))
+ {
+ xid = pxid;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
+ }
}
return false;
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1b2cfac5ad..d7ad1da6a8 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
On Thu, 15 Sep 2022 at 18:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Simon Riggs wrote:
001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.I gave this a quick run to confirm the claimed increase of coverage. It
checks out, so pushed.Thank you.
So now we just have the main part of the patch, reattached here for
the auto patch tester's benefit.
Hi Simon,
Thanks for the updated patch, here are some comments.
There is a typo, s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g.
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
Is there a punctuation mark missing on the following first line?
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid
+ * This occurs only when SERIALIZABLE is requested by app user.
When we use function name in comments, some places we use parentheses,
but others do not use it. Why? I think, we should keep them consistent,
at least in the same commit.
+ * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids.
Maybe we declaration a topxid to avoid calling GetTopTransactionId()
twice when we should set subtrans parent?
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+ TransactionId topxid = GetTopTransactionId();
...
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(topxid, subxid))
+ {
...
+ SubTransSetParent(subxid, topxid);
+ }
--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.
On Thu, 15 Sept 2022 at 12:36, Japin Li <japinli@hotmail.com> wrote:
On Thu, 15 Sep 2022 at 18:04, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-30, Simon Riggs wrote:
001_new_isolation_tests_for_subxids.v3.patch
Adds new test cases to master without adding any new code, specifically
addressing the two areas of code that are not tested by existing tests.
This gives us a baseline from which we can do test driven development.
I'm hoping this can be reviewed and committed fairly smoothly.I gave this a quick run to confirm the claimed increase of coverage. It
checks out, so pushed.Thank you.
So now we just have the main part of the patch, reattached here for
the auto patch tester's benefit.Hi Simon,
Thanks for the updated patch, here are some comments.
Thanks for your comments.
There is a typo, s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g.
+ * call SubTransGetTopMostTransaction() if that xact overflowed;
Is there a punctuation mark missing on the following first line?
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid + * This occurs only when SERIALIZABLE is requested by app user.When we use function name in comments, some places we use parentheses,
but others do not use it. Why? I think, we should keep them consistent,
at least in the same commit.+ * 3. When TransactionIdSetTreeStatus will use a status of SUB_COMMITTED, + * which then requires us to consult subtrans to find parent, which + * is needed to avoid race condition. In this case we ask Clog/Xact + * module if TransactionIdsAreOnSameXactPage(). Since we start a new + * clog page every 32000 xids, this is usually <<1% of subxids.
I've reworded those comments, hoping to address all of your above points.
Maybe we declaration a topxid to avoid calling GetTopTransactionId()
twice when we should set subtrans parent?+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId); + TransactionId topxid = GetTopTransactionId(); ... + if (MyProc->subxidStatus.overflowed || + IsolationIsSerializable() || + !TransactionIdsAreOnSameXactPage(topxid, subxid)) + { ... + SubTransSetParent(subxid, topxid); + }
Seems a minor point, but I've done this anyway.
Thanks for the review.
v10 attached
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
002_minimize_calls_to_SubTransSetParent.v10.patchapplication/octet-stream; name=002_minimize_calls_to_SubTransSetParent.v10.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 66d3548155..140ad78530 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -137,14 +137,8 @@ SubTransGetParent(TransactionId xid)
/*
* SubTransGetTopmostTransaction
*
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID. This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
+ * Returns the topmost transaction of the given transaction id, if one has
+ * been recorded in pg_subtrans.
*/
TransactionId
SubTransGetTopmostTransaction(TransactionId xid)
@@ -177,7 +171,6 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
-
/*
* Initialization of shared memory for SUBTRANS
*/
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..1c8eb0bbf6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,53 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+ TransactionId topxid = GetTopTransactionId();
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopmostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid.
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetTreeStatus() will use status SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids,
+ * depending upon how far apart the subxacts assign subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(topxid, subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons:
+ * (1) so it is faster in a long chain of subxids, because the
+ * algorithm is then O(1), no matter how many subxids are assigned.
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * Previously when we set the parent to be the immediate parent,
+ * we then had to set the parent in all cases to maintain the chain
+ * of values to reach the topxid. If all subxids point to topxid,
+ * then they are independent of each other and we can skip some.
+ */
+ SubTransSetParent(subxid, topxid);
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0555b02a8d..dae3832e87 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1309,14 +1316,14 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
- * the parent xid. This is a difference between normal processing and
- * recovery, yet is still correct in all cases. The reason is that
+ * the parent xid, which is correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
- * have attempted to SubTransSetParent().
+ * have attempted to SubTransSetParent() for this xid in recovery.
+ * XXX this may be optimized later, but we don't have good test coverage.
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
@@ -1440,6 +1447,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1508,6 +1517,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1591,7 +1609,11 @@ TransactionIdIsInProgress(TransactionId xid)
for (int i = 0; i < nxids; i++)
{
if (TransactionIdEquals(xids[i], topxid))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
}
}
@@ -1599,6 +1621,28 @@ TransactionIdIsInProgress(TransactionId xid)
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(LastCallXidIsInProgressParentXid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
@@ -2370,27 +2414,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1043068bac..3e529e1bc2 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,27 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test3 */
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ {
+ /*
+ * We can get here if RecoveryInProgress() during the last call to
+ * TransactionIdIsInProgress(), but we don't Assert that here since
+ * that would create a race window against the end of recovery.
+ */
+ xid = SubTransGetTopmostTransaction(xid);
+ }
}
if (oper != XLTW_None)
@@ -745,6 +774,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +793,15 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 9b504c9745..e50a640231 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -639,13 +639,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1239,8 +1235,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1491,7 +1489,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1505,11 +1503,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
@@ -2285,6 +2278,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
+ bool have_parent = false;
+
/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
@@ -2300,80 +2295,76 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
+ /*
+ * This patch reorders the operations in XidMVCCSnapshot, so as to reduce
+ * calls to SubTransGetParent to the absolute minimum needed.
+ * The previous code was neat, but not efficient for the overflow case.
+ */
+retry_search:
+
/*
* Snapshot information is stored slightly differently in snapshots taken
- * during recovery.
+ * during recovery. xip is empty on standbys.
*/
if (!snapshot->takenDuringRecovery)
{
+ if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
+ return true;
+
/*
- * If the snapshot contains full subxact data, the fastest way to
- * check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
+ * If we have the parent xid, then the xid is not in snapshot
*/
- if (!snapshot->suboverflowed)
- {
- /* we have full data, so search subxip */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
-
- /* not there, fall through to search xip[] */
- }
- else
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
+ if (have_parent)
+ return false;
+ }
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Now search subxip, which contains first 64 subxids of each topxid.
+ */
+ if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
+ return true;
- if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
- return true;
- }
- else
+ if (!have_parent && snapshot->suboverflowed)
{
+ TransactionId pxid;
+
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
+
/*
- * In recovery we store all xids in the subxact array because it is by
- * far the bigger array, and we mostly don't know which xids are
- * top-level and which are subxacts. The xip array is empty.
+ * If we haven't found xid yet, it might be because it is a subxid
+ * that is not present because we overflowed, but it might also be
+ * because the xid is not in the snapshot.
*
- * We start by searching subtrans, if we overflowed.
+ * It is important we do this step last because it is expensive,
+ * and if everybody does this then SubTransSLRU glows white hot.
*/
- if (snapshot->suboverflowed)
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Snapshot overflowed, so convert xid to top-level. This is safe
+ * because we eliminated too-old XIDs above. Every overflowed subxid
+ * will always have a parent recorded during AssignTransactionId()
+ * so this should always return a valid TransactionId, if a subxact.
+ */
+ pxid = SubTransGetTopmostTransaction(xid);
/*
- * We now have either a top-level xid higher than xmin or an
- * indeterminate xid. We don't know whether it's top level or subxact
- * but it doesn't matter. If it's present, the xid is visible.
+ * If xid was indeed a subxact, we might now have an xid < xmin,
+ * so recheck to avoid an array scan. No point in rechecking
+ * xmax. If it wasn't a subxact, pxid will be invalid, so this test
+ * will do the right thing also.
*/
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
+ if (TransactionIdPrecedes(pxid, snapshot->xmin))
+ return false;
+
+ /*
+ * Check whether xid was a subxact. If we now have a parent xid, loop.
+ */
+ if (TransactionIdPrecedes(pxid, xid))
+ {
+ xid = pxid;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
+ }
}
return false;
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 1b2cfac5ad..d7ad1da6a8 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
Thanks for the review.
v10 attached
v11 attached, corrected for recent commit
14ff44f80c09718d43d853363941457f5468cc03.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
002_minimize_calls_to_SubTransSetParent.v11.patchapplication/octet-stream; name=002_minimize_calls_to_SubTransSetParent.v11.patchDownload
commit 6a8e6f99b08a0a03ea8e415b5eaf24ceb398f4b4
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Date: Mon Sep 26 14:51:49 2022 +0100
v11
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d690774f33 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -107,7 +107,18 @@ static bool TransactionGroupUpdateXidStatus(TransactionId xid,
static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status,
XLogRecPtr lsn, int pageno);
+/*
+ * Run locally by a backend to establish whether or not it needs to call
+ * SubTransSetParent for subxid.
+ */
+bool
+TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid)
+{
+ int toppageno = TransactionIdToPage(topxid);
+ int subpageno = TransactionIdToPage(subxid);
+ return (toppageno == subpageno);
+}
/*
* TransactionIdSetTreeStatus
@@ -133,7 +144,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* only once, and the status will be set to committed directly. Otherwise
* we must
* 1. set sub-committed all subxids that are not on the same page as the
- * main xid
+ * main xid (see TransactionIdsAreOnSameXactPage())
* 2. atomically set committed the main xid and the subxids on the same page
* 3. go over the first bunch again and set them committed
* Note that as far as concurrent checkers are concerned, main transaction
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 66d3548155..140ad78530 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -137,14 +137,8 @@ SubTransGetParent(TransactionId xid)
/*
* SubTransGetTopmostTransaction
*
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID. This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
+ * Returns the topmost transaction of the given transaction id, if one has
+ * been recorded in pg_subtrans.
*/
TransactionId
SubTransGetTopmostTransaction(TransactionId xid)
@@ -177,7 +171,6 @@ SubTransGetTopmostTransaction(TransactionId xid)
return previousXid;
}
-
/*
* Initialization of shared memory for SUBTRANS
*/
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 2bb975943c..ff58d19595 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -693,8 +693,53 @@ AssignTransactionId(TransactionState s)
XactTopFullTransactionId = s->fullTransactionId;
if (isSubXact)
- SubTransSetParent(XidFromFullTransactionId(s->fullTransactionId),
- XidFromFullTransactionId(s->parent->fullTransactionId));
+ {
+ TransactionId subxid = XidFromFullTransactionId(s->fullTransactionId);
+ TransactionId topxid = GetTopTransactionId();
+
+ /*
+ * Subtrans entries are only required in specific circumstances:
+ *
+ * 1. When there's no room in PG_PROC, as mentioned above.
+ * During XactLockTableWait() we sometimes need to know the topxid.
+ * If there is room in PG_PROC we can get a subxid's topxid direct
+ * from the procarray if the topxid is still running, using
+ * GetTopmostTransactionIdFromProcArray(). So we only ever need to
+ * call SubTransGetTopmostTransaction() if that xact overflowed;
+ * since that is our current transaction, we know whether or not to
+ * log the xid for future use.
+ * This occurs only when large number of subxids are requested by
+ * app user.
+ *
+ * 2. When IsolationIsSerializable() we sometimes need to access topxid.
+ * This occurs only when SERIALIZABLE is requested by app user.
+ *
+ * 3. When TransactionIdSetTreeStatus() will use status SUB_COMMITTED,
+ * which then requires us to consult subtrans to find parent, which
+ * is needed to avoid race condition. In this case we ask Clog/Xact
+ * module if TransactionIdsAreOnSameXactPage(). Since we start a new
+ * clog page every 32000 xids, this is usually <<1% of subxids,
+ * depending upon how far apart the subxacts assign subxids.
+ */
+ if (MyProc->subxidStatus.overflowed ||
+ IsolationIsSerializable() ||
+ !TransactionIdsAreOnSameXactPage(topxid, subxid))
+ {
+ /*
+ * Insert entries into subtrans for this xid, noting that the entry
+ * points directly to the topxid, not the immediate parent. This is
+ * done for two reasons:
+ * (1) so it is faster in a long chain of subxids, because the
+ * algorithm is then O(1), no matter how many subxids are assigned.
+ * (2) so that we don't need to set subxids for unregistered parents.
+ * Previously when we set the parent to be the immediate parent,
+ * we then had to set the parent in all cases to maintain the chain
+ * of values to reach the topxid. If all subxids point to topxid,
+ * then they are independent of each other and we can skip some.
+ */
+ SubTransSetParent(subxid, topxid);
+ }
+ }
/*
* If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 207c4b27fd..fc42d865be 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -262,6 +262,13 @@ static ProcArrayStruct *procArray;
static PGPROC *allProcs;
+/*
+ * Remember the last call to TransactionIdIsInProgress() to avoid need to call
+ * SubTransGetTopMostTransaction() when the subxid is present in the procarray.
+ */
+static TransactionId LastCallXidIsInProgressSubXid = InvalidTransactionId;
+static TransactionId LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
*/
@@ -1310,14 +1317,14 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
/*
* Notice that we update pg_subtrans with the top-level xid, rather than
- * the parent xid. This is a difference between normal processing and
- * recovery, yet is still correct in all cases. The reason is that
+ * the parent xid, which is correct in all cases. The reason is that
* subtransaction commit is not marked in clog until commit processing, so
* all aborted subtransactions have already been clearly marked in clog.
* As a result we are able to refer directly to the top-level
* transaction's state rather than skipping through all the intermediate
* states in the subtransaction tree. This should be the first time we
- * have attempted to SubTransSetParent().
+ * have attempted to SubTransSetParent() for this xid in recovery.
+ * XXX this may be optimized later, but we don't have good test coverage.
*/
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);
@@ -1441,6 +1448,8 @@ TransactionIdIsInProgress(TransactionId xid)
other_xids = ProcGlobal->xids;
other_subxidstates = ProcGlobal->subxidStates;
+ LastCallXidIsInProgressSubXid = LastCallXidIsInProgressParentXid = InvalidTransactionId;
+
LWLockAcquire(ProcArrayLock, LW_SHARED);
/*
@@ -1509,6 +1518,15 @@ TransactionIdIsInProgress(TransactionId xid)
{
LWLockRelease(ProcArrayLock);
xc_by_child_xid_inc();
+
+ /*
+ * Remember the parent xid, for use during XactLockTableWait().
+ * We do this because it is cheaper than looking up pg_subtrans,
+ * and also allows us to reduce calls to subtrans.
+ */
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = pxid;
+
return true;
}
}
@@ -1589,12 +1607,38 @@ TransactionIdIsInProgress(TransactionId xid)
Assert(TransactionIdIsValid(topxid));
if (!TransactionIdEquals(topxid, xid) &&
pg_lfind32(topxid, xids, nxids))
+ {
+ LastCallXidIsInProgressSubXid = xid;
+ LastCallXidIsInProgressParentXid = topxid;
return true;
+ }
cachedXidIsNotInProgress = xid;
return false;
}
+/*
+ * Allow the topmost xid to be accessed from the last call to
+ * TransactionIdIsInProgress(). Specifically designed for use in
+ * XactLockTableWait().
+ */
+bool
+GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid)
+{
+ bool found = false;
+
+ Assert(TransactionIdIsNormal(xid));
+
+ if (LastCallXidIsInProgressSubXid == xid)
+ {
+ Assert(TransactionIdIsNormal(LastCallXidIsInProgressParentXid));
+ *pxid = LastCallXidIsInProgressParentXid;
+ found = true;
+ }
+
+ return found;
+}
+
/*
* TransactionIdIsActive -- is xid the top-level XID of an active backend?
*
@@ -2366,27 +2410,34 @@ GetSnapshotData(Snapshot snapshot)
*
* Again, our own XIDs are not included in the snapshot.
*/
- if (!suboverflowed)
+ if (subxidStates[pgxactoff].overflowed)
+ suboverflowed = true;
+
+ /*
+ * Include all subxids, whether or not we overflowed. This is
+ * important because it can reduce the number of accesses to SUBTRANS
+ * when we search snapshots, which is important for scalability,
+ * especially in the presence of both overflowed and long running xacts.
+ * This is true when fraction of subxids held in subxip is a large
+ * fraction of the total subxids in use. In the case where one or more
+ * transactions had more subxids in progress than the total size of
+ * the cache this might not be true, but we consider that case to be
+ * unlikely, even if it is possible.
+ */
{
+ int nsubxids = subxidStates[pgxactoff].count;
- if (subxidStates[pgxactoff].overflowed)
- suboverflowed = true;
- else
+ if (nsubxids > 0)
{
- int nsubxids = subxidStates[pgxactoff].count;
-
- if (nsubxids > 0)
- {
- int pgprocno = pgprocnos[pgxactoff];
- PGPROC *proc = &allProcs[pgprocno];
+ int pgprocno = pgprocnos[pgxactoff];
+ PGPROC *proc = &allProcs[pgprocno];
- pg_read_barrier(); /* pairs with GetNewTransactionId */
+ pg_read_barrier(); /* pairs with GetNewTransactionId */
- memcpy(snapshot->subxip + subcount,
- (void *) proc->subxids.xids,
- nsubxids * sizeof(TransactionId));
- subcount += nsubxids;
- }
+ memcpy(snapshot->subxip + subcount,
+ (void *) proc->subxids.xids,
+ nsubxids * sizeof(TransactionId));
+ subcount += nsubxids;
}
}
}
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 1043068bac..3e529e1bc2 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -694,6 +694,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -703,6 +705,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
LockRelease(&tag, ShareLock, false);
+ /*
+ * If a transaction has no lock, it might be a top-level transaction,
+ * in which case the procarray will show it as not in progress.
+ *
+ * If a transaction is a subtransaction, then it could have committed
+ * or aborted, yet the top-level transaction may still be in progress.
+ */
if (!TransactionIdIsInProgress(xid))
break;
@@ -724,7 +733,27 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /*
+ * In most cases, we can get the parent xid from our prior call to
+ * TransactionIdIsInProgress(), except in hot standby. If not, we have
+ * to ask subtrans for the parent.
+ */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test3 */
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ {
+ /*
+ * We can get here if RecoveryInProgress() during the last call to
+ * TransactionIdIsInProgress(), but we don't Assert that here since
+ * that would create a race window against the end of recovery.
+ */
+ xid = SubTransGetTopmostTransaction(xid);
+ }
}
if (oper != XLTW_None)
@@ -745,6 +774,8 @@ ConditionalXactLockTableWait(TransactionId xid)
for (;;)
{
+ TransactionId pxid = InvalidTransactionId;
+
Assert(TransactionIdIsValid(xid));
Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -762,7 +793,15 @@ ConditionalXactLockTableWait(TransactionId xid)
if (!first)
pg_usleep(1000L);
first = false;
- xid = SubTransGetTopmostTransaction(xid);
+
+ /* See XactLockTableWait about this case */
+ if (GetTopmostTransactionIdFromProcArray(xid, &pxid))
+ {
+ Assert(TransactionIdIsValid(pxid));
+ xid = pxid;
+ }
+ else
+ xid = SubTransGetTopmostTransaction(xid);
}
return true;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f1f2ddac17..b8d4e1d8dc 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -639,13 +639,9 @@ CopySnapshot(Snapshot snapshot)
newsnap->xip = NULL;
/*
- * Setup subXID array. Don't bother to copy it if it had overflowed,
- * though, because it's not used anywhere in that case. Except if it's a
- * snapshot taken during recovery; all the top-level XIDs are in subxip as
- * well in that case, so we mustn't lose them.
+ * Setup subXID array.
*/
- if (snapshot->subxcnt > 0 &&
- (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
+ if (snapshot->subxcnt > 0)
{
newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
memcpy(newsnap->subxip, snapshot->subxip,
@@ -1240,8 +1236,10 @@ ExportSnapshot(Snapshot snapshot)
snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
appendStringInfoString(&buf, "sof:1\n");
else
- {
appendStringInfoString(&buf, "sof:0\n");
+
+ /* then unconditionally, since we always include all subxids */
+ {
appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
for (i = 0; i < snapshot->subxcnt; i++)
appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
@@ -1492,7 +1490,7 @@ ImportSnapshot(const char *idstr)
snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
- if (!snapshot.suboverflowed)
+ /* then unconditionally, since we always include all subxids */
{
snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
@@ -1506,11 +1504,6 @@ ImportSnapshot(const char *idstr)
for (i = 0; i < xcnt; i++)
snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
}
- else
- {
- snapshot.subxcnt = 0;
- snapshot.subxip = NULL;
- }
snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
@@ -2286,6 +2279,8 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
+ bool have_parent = false;
+
/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
@@ -2301,80 +2296,76 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
+ /*
+ * This patch reorders the operations in XidMVCCSnapshot, so as to reduce
+ * calls to SubTransGetParent to the absolute minimum needed.
+ * The previous code was neat, but not efficient for the overflow case.
+ */
+retry_search:
+
/*
* Snapshot information is stored slightly differently in snapshots taken
- * during recovery.
+ * during recovery. xip is empty on standbys.
*/
if (!snapshot->takenDuringRecovery)
{
+ if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
+ return true;
+
/*
- * If the snapshot contains full subxact data, the fastest way to
- * check things is just to compare the given XID against both subxact
- * XIDs and top-level XIDs. If the snapshot overflowed, we have to
- * use pg_subtrans to convert a subxact XID to its parent XID, but
- * then we need only look at top-level XIDs not subxacts.
+ * If we have the parent xid, then the xid is not in snapshot
*/
- if (!snapshot->suboverflowed)
- {
- /* we have full data, so search subxip */
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
-
- /* not there, fall through to search xip[] */
- }
- else
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
+ if (have_parent)
+ return false;
+ }
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Now search subxip, which contains first 64 subxids of each topxid.
+ */
+ if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
+ return true;
- if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
- return true;
- }
- else
+ if (!have_parent && snapshot->suboverflowed)
{
+ TransactionId pxid;
+
+ /* TESTED-BY src/test/isolation/specs/subx-overflow.spec test1 and test2 */
+
/*
- * In recovery we store all xids in the subxact array because it is by
- * far the bigger array, and we mostly don't know which xids are
- * top-level and which are subxacts. The xip array is empty.
+ * If we haven't found xid yet, it might be because it is a subxid
+ * that is not present because we overflowed, but it might also be
+ * because the xid is not in the snapshot.
*
- * We start by searching subtrans, if we overflowed.
+ * It is important we do this step last because it is expensive,
+ * and if everybody does this then SubTransSLRU glows white hot.
*/
- if (snapshot->suboverflowed)
- {
- /*
- * Snapshot overflowed, so convert xid to top-level. This is safe
- * because we eliminated too-old XIDs above.
- */
- xid = SubTransGetTopmostTransaction(xid);
- /*
- * If xid was indeed a subxact, we might now have an xid < xmin,
- * so recheck to avoid an array scan. No point in rechecking
- * xmax.
- */
- if (TransactionIdPrecedes(xid, snapshot->xmin))
- return false;
- }
+ /*
+ * Snapshot overflowed, so convert xid to top-level. This is safe
+ * because we eliminated too-old XIDs above. Every overflowed subxid
+ * will always have a parent recorded during AssignTransactionId()
+ * so this should always return a valid TransactionId, if a subxact.
+ */
+ pxid = SubTransGetTopmostTransaction(xid);
/*
- * We now have either a top-level xid higher than xmin or an
- * indeterminate xid. We don't know whether it's top level or subxact
- * but it doesn't matter. If it's present, the xid is visible.
+ * If xid was indeed a subxact, we might now have an xid < xmin,
+ * so recheck to avoid an array scan. No point in rechecking
+ * xmax. If it wasn't a subxact, pxid will be invalid, so this test
+ * will do the right thing also.
*/
- if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
- return true;
+ if (TransactionIdPrecedes(pxid, snapshot->xmin))
+ return false;
+
+ /*
+ * Check whether xid was a subxact. If we now have a parent xid, loop.
+ */
+ if (TransactionIdPrecedes(pxid, xid))
+ {
+ xid = pxid;
+ have_parent = true;
+ goto retry_search; /* search arrays again, now we have parent */
+ }
}
return false;
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 775471d2a7..f404db552f 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -301,6 +301,9 @@ extern void AssertTransactionIdInAllowableRange(TransactionId xid);
#define AssertTransactionIdInAllowableRange(xid) ((void)true)
#endif
+/* in transam/clog.c */
+extern bool TransactionIdsAreOnSameXactPage(TransactionId topxid, TransactionId subxid);
+
/*
* Some frontend programs include this header. For compilers that emit static
* inline functions even when they're unused, that leads to unsatisfied
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 9761f5374c..8f344dab6f 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -52,6 +52,8 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
+extern bool GetTopmostTransactionIdFromProcArray(TransactionId xid, TransactionId *pxid);
+
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
Hi,
Le lun. 26 sept. 2022 à 15:57, Simon Riggs
<simon.riggs@enterprisedb.com> a écrit :
On Fri, 16 Sept 2022 at 13:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
Thanks for the review.
v10 attached
v11 attached, corrected for recent commit
14ff44f80c09718d43d853363941457f5468cc03.
Please find below the performance tests results I have produced for this patch.
Attaching some charts and the scripts used to reproduce these tests.
1. Assumption
The number of sub-transaction issued by only one long running
transaction may affect global TPS throughput if the number of
sub-transaction exceeds 64 (sub-overflow)
2. Testing scenario
Based on pgbench, 2 different types of DB activity are applied concurrently:
- 1 long running transaction, including N sub-transactions
- X pgbench clients running read-only workload
Tests are executed with a varying number of sub-transactions: from 0 to 128
Key metric is the TPS rate reported by pgbench runs in read-only mode
Tests are executed against
- HEAD (14a737)
- HEAD (14a737) + 002_minimize_calls_to_SubTransSetParent.v11.patch
3. Long transaction anatomy
Two different long transactions are tested because they don't have the
exact same impact on performance.
Transaction number 1 includes one UPDATE affecting each row of
pgbench_accounts, plus an additional UPDATE affecting only one row but
executed in its own rollbacked sub-transaction:
BEGIN;
SAVEPOINT s1;
SAVEPOINT s2;
-- ...
SAVEPOINT sN - 1;
UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0;
SAVEPOINT sN;
UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = 12345;
ROLLBACK TO SAVEPOINT sN;
-- sleeping until the end of the test
ROLLBACK;
Transaction 2 includes one UPDATE affecting each row of pgbench_accounts:
BEGIN;
SAVEPOINT s1;
SAVEPOINT s2;
-- ...
SAVEPOINT sN;
UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid > 0;
-- sleeping until the end of the test
ROLLBACK;
4. Test results with transaction 1
TPS vs number of sub-transaction
nsubx HEAD patched
--------------------
0 441109 439474
8 439045 438103
16 439123 436993
24 436269 434194
32 439707 437429
40 439997 437220
48 439388 437422
56 439409 437210
64 439748 437366
72 92869 434448
80 66577 434100
88 61243 434255
96 57016 434419
104 52132 434917
112 49181 433755
120 46581 434044
128 44067 434268
Perf profiling on HEAD with 80 sub-transactions:
Overhead Symbol
51.26% [.] LWLockAttemptLock
24.59% [.] LWLockRelease
0.36% [.] base_yyparse
0.35% [.] PinBuffer
0.34% [.] AllocSetAlloc
0.33% [.] hash_search_with_hash_value
0.22% [.] LWLockAcquire
0.20% [.] UnpinBuffer
0.15% [.] SimpleLruReadPage_ReadOnly
0.15% [.] _bt_compare
Perf profiling on patched with 80 sub-transactions:
Overhead Symbol
2.64% [.] AllocSetAlloc
2.09% [.] base_yyparse
1.76% [.] hash_search_with_hash_value
1.62% [.] LWLockAttemptLock
1.26% [.] MemoryContextAllocZeroAligned
0.93% [.] _bt_compare
0.92% [.] expression_tree_walker_impl.part.4
0.84% [.] SearchCatCache1
0.79% [.] palloc
0.64% [.] core_yylex
5. Test results with transaction 2
nsubx HEAD patched
--------------------
0 440145 443816
8 438867 443081
16 438634 441786
24 436406 440187
32 439203 442447
40 439819 443574
48 439314 442941
56 439801 443736
64 439074 441970
72 439833 444132
80 148737 439941
88 413714 443343
96 251098 442021
104 70190 443488
112 405507 438866
120 177827 443202
128 399431 441842
From the performance point of view, this patch clearly fixes the
dramatic TPS collapse shown in these tests.
Regards,
--
Julien Tachoires
EDB
Attachments:
tps-head-patched-xact-2.pngimage/png; name=tps-head-patched-xact-2.pngDownload
�PNG
IHDR ���� AiCCPICC Profile H��WXS��[��@h�H � ��B l�$@(1��YTp-�X����(������"��bAAYu�+oR@�}�{�}s�������sg����i�H����#�G�3���g�t ���+bFE�X����w7"m��I����_���� ��8������0 x%W$��(�Mg���V�%�B�T����R�S����&6�q J*�8 ��3���PC�b!O @��ON�t�)[A�R}��t����:����c�\dE)@�+����?���KN�d���*��h��a�ngM�b�����kB�A���C�R2$!qr{T����9t�x��0��!fG�+��4Ab�B�Y�<v,�:/���(l���G+|��ibS�_��e~��J���
�7|�BS-��M���Y� >bU��s�b�6�2XC6bI�4~3����`�>��&�V�������!`G(p]^Fl�<?X�#�����qC:�����������c=|a\�B��(�?Z>������ ?;X��@��������)���DyQ��8��LNh�<|, @k*�2�������{� �A:�;34"A�#��P ���r����z� �_�Y����z�e#��3�s@����(���x�2�x�����f�*������aB&\�H�<2��,��� b1�h���>��~�:�����<���: � 7]�;���QN ]P?H���s�[@M����P��z�w�~��/��Y�"niV#��6������@F���~d��#UmT]�U���1?�XS�������C�y�
i�-�a���"vk ����a'�xxu=���!o��x��������4��5�_�}y�Y�w4`M��3�L�E�3�B������ �������.�n �K������>88x�;�@�T��[�s�������\�8_����%��N���X��8W��@ � $��0����`&��bP
V�u`�
v�=`?�
�88����� ����/A?x>#BB�
�E�s�qB�$ G��$$IG����,FJ�2d��F~E�"g��H'ry��"o�O(���Z�j��E�Q&���S�ttZ��+�
h��G����h�� ���1c�s�XX$���abl>V��cUX-���5���>�D��3p;��C�8��������M��o�����~��J�'�< lB"!�0�PL('�"!��{����H$���D7�����9��������N�� �D�%���I�$)�TL�H�G:E�J�&}PRV2RrR
RJV**�+�U:�tU���g�:���I�$����+�;�M�+�n�g����M��dRQ6Pj)�(�)o���M�=�'*�*oP>�|A���GM��d��
��*�U����R�T?j25���ZM=K}H��JS�We��T�V���^U}�FV3Wc�MU+P+W;�vE�O��n��R���W�P?�~K}@��������\c��E�M���f�&O�Hs��Y�'4�fJc�������s�n-���[+S�Tk�V�V�����v��,�
��]t�nAg���+�u���O�F1G�G-U;����:�u�t�:%:tn�|�e��f���m�}�����M����E��^�h��^���KF��������G�������?``hl 2�hp����n�g�i����a����H`������6���fl`�0����C�%�����?�X����0y`J1u7M3]k�l�ofd6�l�Y��]s���y��z�V��� K,,z,u,���5����V�V3����[�����7[w��6.666WlQ[W[��f��1�1c�c����S�c�����=������7��k66y����c�9�8d;�t�����X���������T�t}u\�����v�u�;oq��Bs������������������-�������{��r���=>z�z�y�y��e������g��x�����x�xs��{w�0|R|��t��r|�|�����v�=gZ33��������G���<Y�X��������@����M��L���j���]���!������6`s����P��y�-a*a1a��������&�B'��p?�<B� "��k"DYF��:6�81jb��g���s�[ch1�b������]{/�*N��?9�:�}B@BYBW���y�����I������]��'���=�er���S,���rq�����'��M�L;�BHIH���������S+S��,�z�K�o-����/�?O�N+K�I�N_�����Q��'` 6 ^g�dn�|���;k0;!�@�RNJ�Q��0K�2�p����"[Q��k���u3��a�]�H����<-�#�&���$y���_��af��C�4f g�����l�����_��s�s���]4��<����������.(Z��0x��E�EY�~+t(,+�kq���"���EO~
���X�X\|k����K�������-���[ ��R�Ciy������~v�y���+�V��t]�eq�p�������i��=Y3aM�Z��������b�s�������]�74n4��j��M�nT�W���\V�~3o��-~[j�l-��i�`�������,��ww��x�3~g�/��T���U���n���=�{Z�������]Y��Hjz�M���?`c�]�����A�����z�.����������+�����#����2��;��mn�j:r�������+Nh�Xy�r������S�E�����y�<�������[&���;w�|������S�/��y��%�K
�]/��������#����W��4vxt4u��<y����k��_g_�|#�F�����oM��u�w��N���w��~���>�~����V�n���.����=�y|� �����O�t=�>+n������xoPo��I/�_�^~�+�C���WV����g[b�k���7��������_�Q����������=�?�~J�����/�/�Zm����`����#��~0X��4 ����
��(���?YA�gV� �����
@-l����� ���U ������q�u��&;WJ��H��5S�E~��!��-��:���� �d|(R'� �eXIfMM * > F( �i N � � �� x� � ASCII Screenshot�\'� pHYs % %IR$� �iTXtXML:com.adobe.xmp <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 6.0.0">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:exif="http://ns.adobe.com/exif/1.0/">
<exif:PixelYDimension>1042</exif:PixelYDimension>
<exif:PixelXDimension>1812</exif:PixelXDimension>
<exif:UserComment>Screenshot</exif:UserComment>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
�t]} iDOT ( �e�7c @ IDATx���$E����p��8� 9JF$c 1
���������(�GEQ�EyP0!
(��""Y��H� ��������������������/������ywOOO}��{����@ @ @ @ @ @ @ �=�-Tx@ @ @ @ @ @ L�@�
@ @ @ @ @ @ rsi�� � � � � � � �l � � � � � � �+@��K�@ @ @ @ @ @ Pd@ @ @ @ @ @ @ �\�\f � � � � � � � �"� � � � � � � �
(��0@ @ @ @ @ @ �@ @ @ @ @ @ @ W�@1�� � � � � � � @��6� � � � � � � ���4�@ @ @ @ @ @ @ E�@ @ @ @ @ @ � P��a � � � � � � (�
� � � � � � �@� �b.
3@ @ @ @ @ @ @�@�m @ @ @ @ @ @ rsi�� � � � � � � �l � � � � � � �+@��K�@ @ @ @ @ @ Pd@ @ @ @ @ @ @ �\�\f � � � � � � � �"� � � � � � � �
(��0@ @ @ @ @ @ �@ @ @ @ @ @ @ W�@1�� � � � � � � @��6� � � � � � � ���4�@ @ @ @ @ @ @ E�@ @ @ @ @ @ � P��a � � � � � � (�
� � � � � � �@� �b.
3@ @ @ @ @ @ @�@�m @ @ @ @ @ @ rsi�� � � � � � � �l � � � � � � �+@��K�@ @ @ @ @ @ Pd@ @ @ @ @ @ @ �\�\f � � � � � � � �"� � � � � � � �
(��0@ @ @ @ @ @ �@ @ @ @ @ @ @ W�@1�� � � � � � � @��6� � � 0&�����Y���I�\OO��x��H@ @ @�
�*�@ @ �1$p��������m�������_x��n��y��o~��>}�p�[n���������;�y�k�J+�4�����{����5�\�M�0����Nn���#������������)S�����en��WwS�Nuz����
�w�}�����w�}��v_��~�;��s���m��z(� � � ,����b�@ @ X�;�0�����}��pGqD��������^p
_B������;�����m�~�{�s;�����/������K�QG�.N�6�)�\w�uG�
-����'�e�]����z�u�{�{�k_�VYe+C�v����>����/�6q��7�%�\r�\!��w������n�
@ @ @ �ED�@qY, � � �"Pu������F����?��O����jD������~�����.�0q��N����|D��@[-�B�����)2}�Q7g���^{���������n��q�n�@���r�l��-�B@�;(.� �G @ @ �n P��5�r"� � ��T(x������[�W700����?���>�N��S��{��n��6r?���lDa�k��@�yt�T��Y���Z�_~�����T�������s�^O�&@����@ @ @`(����KD @ X4Z
�_��D��_U���(>�����.p'�t��v�m� '�`� u���?�m��&v=�f��@������w�c�4~�K_roz���Q����9^�F��z��#��T�v��]���:���K��K/���g��/u��I�&���'�\oz��f�r3g�ts���S�j��ij��0�T��x� ��c���w��V��a]S�r�-g���}�[-����iR��O��l�~fv����F{
Z.�n��������&@ @ @���y��� � � ]-0g��{��A7sn� h�I=n��^���hq��E-PT�������|��v=��:������+_��������;��c,4j�: 5
���w�]v��w�}�>���j�-<�B�00�%7��g����YS��{|���O���t����*���y��n����@��k���p�UWuox��{���Xc��@N�[#G/��R[����e��y����]�R�����-o�]d]���~������������}�����k��v�yg��w��-���
�jx
���O~���R�k�it��-���:0�.3@ @ @ �E\�@q_A, � ���+��G���2��xWm$T�W��������/QM���
��_��}����P��SO�k2^���]�z��@������y(j�(h�����_�u�Q-��k=����MuO��\7�w�Y������x�[v������>_C�������P8[��@�,��p7�|��Y�x������������;Z��m����V�o7P���[�)������Ja��W�%_����o}�[�#5?�������ns
Dm�h�����f�����n����V�}@ @ @���ju�� � � �� ����Yg�e�$\�k(>�������v?���[��V��<���n�wt����~��������Q'#������)<��uZU�?�S��'N��|��k_N�0��9]����r[o����%�X��5����O���|���u�{�[r�%��������E]�4�Ta��>�)�?c���#�X �J�����S�������/|�]x���)��~��-XT���o�F+~��p����Wy�����z���n�����b;e��_l�����e��=�@ @ @ �n P��5�r#� � t�@��
q��lR��Q[������n8\<�y
�t��v�o~��
����?���v�S����C�SS�q]O� Z��y����R�������p���[][�=�y��ZU�TU�4�S��+���SP������a��j�����e�����F-�[o��}�s�so{���2�,c�6T��mM����������3�^=:��6�lc�y��6b5���S��jW!�.��b���?��SNA������-���Nk��������m��V�j{W�������G?r���L � � �@�
(v��c�@ @ �^���NW@^��I=�zw�9s��q��s�]/Q�������V^ye���I�V���Ti���k��p-�N�����}����p�
v�r����n����k�S�*�{���-�� 5�T��&=>o�<�u*R� �W�������a*
O<�Dw�y��F��Z�0���
�y�7~�x;�m�E]�S��]v�0���S�y��n�M7�Prx&w@ @ @ �. P����"� � ,>�(���~N��&��u��E�W�������;��2L����k��R*���g>�����4)�z���l����w����w���
#�:
uj��������ow_��W�N�9�S��E*�pJ����W\q���G>b�R�(��I�RuJ�{���N�z�=��;���=��B~���~JQ�x�GX���4B1��
W��NvjD��0)���o�:� ��F��.�,��@ @ �:��[e,0 � ���"����NS� �lZ���/��(�{��V�����t��v�J/{�������PL�:�}]kO�)���4P�P�(9���i:G{��@Q�'�c___������g;��Zk�e��
n�� �Z�
�����
"�iTc����>�!w���Z � �yYB���!P�iV5"2���(~��tl���������"� � � ]'@��u��F @ X\?o���oEI��0G!R�����l����w(*R��kG}���5�yMY���w{���b�Z�adi ��Ku}�|��)������r�)���n��o�u�u��������^����K:
5BV��:U�8n�����!P�����v�q�e G�� � � t� �b��8@ @���~l����Y��{�G}1o~����&���,����4x�a�9]�p�G(�x����3�p��z�{��_���v���������t�M�����5��_~y+�I��Qo*��Z ��I�d���������gn�]��(=������p��������saa�NE��^{���Z��:��V��<���v�m7<�0��t�m��O���i�����z���G�PT�F
��TA�/~������i��2�,��,y��@��'�� � � ��(..k��� � �u��4��r�����Q�}����j/�w��T���(*�;��s��g��V[m5�e*
/���T��Otn��M����lt�O<�>���Z��|��J�;�d��t��������A��^7i������r=M�&�����!���=�P������/���X0x�u��u
u}CM�����O>�BC�<
����
U�F�����@Q�JU�F=G��NM�v��b�P��3g�]�Qu����6�1���@1�p� � � �� ����fy] � � DX�'�|��t�Iv�;�n�2) ��t��8�=����_���MS�L�=�������Y�f9�u����0��Moz����>�42.Ry��x�@�t���������.n�5��'�{�����/v?����
+�`��T�����~�S8��O�)���u���������T�w�auzZM?��]Gs���6����|���Z�
-�=�������s�=�Z��t
���?���W�r������
���k P"�"� � ���*@����Y^ � � ����+�]\�8��w�;������3�4�
z����?�i�����j#�^����A��S��=������mT�f�mf����rK7y���&�_ ��M6��,5�o����g�u�]f>��^���}���?h�K�L�jW��F�*\��k�*����_|�=���N�j4j8-����B���4JU�dj���g���=�\;m��[o��\sM�m@�����������nfx
�+�Y � � �X(.��� � �@:���N�\�9U���������e�������p�
��#��`J������;��u�Y-����k��Nk�Qp�x�+l$�N��(����Bw��!���c������Q�8}�t5�P#P�]v��Wv�����no��6��3��HR��UVY�F%jD���^j��G?�Q��w���������O?�����tO?��=o�� "*@���{l��W\a���Im���:��c�!n�eN�^����� � � ,������e!� � K@��5�\c�lgd�?��;��c�4�a��;�<w����?��=������w�M7��w�����N
!O^E:��Nk�������w�o����O�rtLG��0q�����i
���R^[<^.�
�4B�������k���}��V[9��4i������w��W�HU����[�i��������n��F���g{�����`��k�y�x�!�����m<��F;j]+���idb��h����m\��<id�g�a���2!� � � �*@���k��F @ @ �X (����]�JXt@ @ @`� P\��1�@ @ X��U�!� � � ���4�@ @ @ �%@�K�z@ @ @ ���7�F@ @ @ �� f#� � � �� ��"�2X@ @ @`�(��5��D @ @ ��A�@qqX��@ @ @��f���������q��l�Y\@ @ @`l (�����E @ @ @ @ @ @�#���(� � � � � � ��� P[��W� � � � � � �@G�qQ@ @ @ @ @ @ ��%@�8��7�@ @ @ @ @ @ ��;��0 � � � � � � cK�@ql�o^- � � � � � � (v�Ea@ @ @ @ @ @ �� ���Z��Z@ @ @ @ @ @ : P���� � � � � � � �-����y� � � � � � � t$@���@ @ @ @ @ @ [�ck}�j@ @ @ @ @ @ �H�@�#.
#� � � � � � 0������"� � � � � � �� �bG\F @ @ @ @ @ @`l (�����E @ @ @ @ @ @�#���(� � � � � � ��� P[��W� � � � � � �@G�qQ@ @ @ @ @ @ ��%@�8��7�@ @ @ @ @ @ ��;��0 � � � � � � cK�@ql�o^- � � � � � � (v�Ea@ @ @ @ @ @ �� ���Z��Z@ @ @ @ @ @ : P���� � � � � � � �-����y� � � � � � � t$@���@ @ @ @ @ @ [�ck}�j@ @ @ @ @ @ �H�@�#.
#� � � � � � 0������"� � � � � � �� �bG\F @ @ @ @ @ @`l (�����E @ @ @ @ @ @�#���(� � � � � � ��� P[��W� � � � � � �@G�qQ@ @ @ @ @ @ ��%@�8��7�@ @ @ @ @ @ ��;��0 � � � � � � cK�@ql�o^- � � � � � � (v�Ea@ @ @ @ @ @ �� ���Z���v``�=�����.(,�����m��
ef������~����=��Sn���n��I��/��h����+����������/�{�������vO?���7o��2e�[i����o��_~y���;�y�<�������}�����y788��\rI��j����[�-��r#������>������O���Xc
��:����^Z��{�1�����=�������~�����s���S��x � � � � � � �.@���k����9s����+�.��RX��g������
�y�����_�~�������/���v/���[b�%,p�n���^{�e������� ����v���/��
������M��^��W�v������v?F��o~�w���[o��BB��
�^���ov���������;
/��B{��w�i�gOO����n���q���;��V^y����=����;�<w���[��0S���*��-������~����2�@ @ @ @ @ @ �H�@��VV�E����|�;���l��Zk��rt��>�1��7��I#&~�K_r�]v��q����
5�P�����w��}�sn���v
� ��w�s_��W�
7�`������;�������������s�,��=o�����s�q'�|�{��GlD�
+�`����hG�h��G?�>���/��w�)��SO=�i4��Q�������T�~�;��C�y���{����3������k�(F��|���,��H�O~��n��vn/���
@ @ @ @ @ @ � �bl�.�_#��:�(��_����8�M�0��U�t�������?�y�>���]w���:�?������m �������n���6:��O� s�M6qx���T5:Q�����o}������TF�U������?�7���6Rr�m��y����?���,���t�TMzm{����6|�;����o?��f���R���Z{
8��zkw����hG��<��{�k_k�p�s������^�*@=����-��b���/|���� � � � � � ��b @�����^�F�����|�;mD����g����W��m���N:�$;U��D�n�N
���O�z�E����NetJ�k���F5�}�����s�������iYtm�=���]u�UV����o�:��N�>�h-�s��vjx����7��zH�M
3�9�;}���-��b�ye���~�S�rs��u_����^�~Ua�Ga�Q����W_��y
KU^�h��l��6���� � � � � � t� �b���
�_����K5�O��}]SQ���)Ju*��S�Z��S��I��N8�;���:=�y���<�He���z�����c���~��t���8�y��{�;��Cm��N�������O?�F�:=�������N����*�|����N;�4����
u
�jf��S�����s7uzV���|�#������O�>�]w�u��_���|T��6�@ @ @ @ @ @ ��E�@qqY�����{����� M��N�y���u��W���N�@�y���t�a���A]�P�$��?u��������nT��y����������>]�Q�_T���p�~��n����Q�:���C�����l��F5.��������K.���U�B�iO5���.�����]Q�0uj�����O�iV$��^�FI���N�/��b����[��}��?�t��/������-2!� � � � � � �� ����&+x=��]7��_���D��y���w���'l�DM
5�O��y��7"�S���z�B�Gy��:�.����O>��������K,���Oj_�[g�u���
7�h#��~px���O�4i�9��S��t�o~��-���CM:��7�l�W�q�m�����o��YN#):� S���J��Cn�����Gu7�t��x�����B���?�}����@���2�G @ @ @ @ @ �U�@�[�\�����;-,�)L�[n9u���K-�����k���N7z�=�8���u5������55�P�Fm��m���v�F��u�]�:�
>�`�S� ����������v�S�j�������v}D�^4����5o��8_����'�x�f)�k��ok�"����:�����.�������iV��_�rk������-�*�<����N�����w����}@ @ @ @ @ @���z�U��
��c�`��3�p����������q����=WZi%;��NWz��������*�<)��|���m��fev�m7/O9����~�FF6?Omj��2�,�^x�]�����2�]Q�f���r�-���K/�t*SMk������{�F������zu:T]�Q�(�}-��T��l~��)P�(E�Z�N�c�9���N��iY���)[�@ @ @ @ @ � ��A7� P$@�X�3����=�=�����u]���'�P �������t_����:�
uT�F4o����E�~T�����v;M�E]1o����������7�FHn��&n����������-�k>���o��F(jt��q�i���sw�u��P�5"�w�
Lu
I���j���_���I�f�~U���NZ6�_��W�>����d:���,���9��U��%��s)� � � � � �x �l��}���+��T-@�X��b^�K/�d��W\a!�G>�w��gZ8��������n���#4�O�{�1�������~��v
�������8e����i����Q�'Nt������p�����*������Oa�BC��TA�������-#N�:g�;<�@�S�^~��v
E��������7������${
E���L����������>�X0�S�n��m=O!�,��;���5:U#V����9���� Z�W�,��kE?��p��W�=���)�5_?p�>}z���>�4������V?�
���V@����]8���vjk%�cb]\��:kS}7�et����7i�`��������]��cG"�+��~g��a���MK��nM��:v�~E����c�g[�g�>C���U�_��:f�@���������}��w���+^1b�Kj}�4��/�2�^w (v����
�����}�kN������>��O��=��S�l����e�]�CJA�;������Z(�pO�51;��7�lT��NG�����hH�s��G���;H���]v��w�}mT��W_m���mosW]u���x�;Ft�+0R��zU��.�k'*\������-��W�z����?n��U�C9��������N�z����k��[����^��>`WXa��
�� ��_�:���Z���Z��2��D��\r�>���PE�.fJ#��{m�����GPi�b���f��n���'�M���?��3�I���3�P��~�,��5�����H�Fmd��q[�v h��v>~�x~l�p���\��%�\�+$rg���������?��.�+���[������;������V� P����k��?�a��~���/}���x��Cu�B��S�����]z��v�O�-
_����pP��S�������O��u��s���zL��u��>�.�������F�v�in��W����k~�c�N���4���?�����>g��Yg�e�u�F�Gy��ll0x�����g�m���t@�S��T����7ltfs�����/~�����~�����T}�@�j�����b� P��Z^'�b�QU%��������*M�X�d��(�oUeI�*5������'�\�X���(����>&�j~���61�(��m]7�bk���YW�����o� M���������s�m*<<��c���/o�5BNA�I'�d�=��C�G1�y����:]����S�������
)���I_T*<�����B��/�����'u�T��4;��ZjO��T�Z&M
������N8�)��N�\s�;��c�}���>���[X��������Q�
CWYe�������~�F:h���%s"P����n��.�%P�-��~��.1%P��Z^'�b�Q��1T��$P,��5�@1���z��i����#� Pi��1���'Pl�H��b*��(.��^w
(v�z���
�N?�t���:%�����[�N��/
�>�F��t�|����(?}�h�F�i��F�i��N���^��������z��6�P��t=E��T�u}E�.t�-���wh�F5~�_�:~���9��:���>�>����n����(I��F�����~����iMU���.ro��yi���{o��OXx���>������L��U��T��S����:5�Fc��
o�S�}���S���?:]���;����F��Uf�N�Z�
#P�R����s���b����G���S��P������x�b���A���WU� ��,����r�%c���I�XnTu �1U��G�X�k.�b,��z �m�������,�:6o��F��Q�i�kh���
��i���nk�7�l3��)4�hAuD+�S0�s�������=��R#�B���3]wP���Le
$u��'�|�i�����w���N�����N��s�����on��E�������v�����)X���
�t�
u!�����)T�����^��x��6�P���<=�}~��/�K��J+�u5JQ^���:�t]=��T�9n�8Um"P�F�[1�b.M��Qys+'P���|�b��mUH��S��+'-��@��(J�(�-+���%K� ��l�@�%K���D�Q9���$(&anh�@���?�P�@�WZ�EV�W\������.p��|
S4�o��W�N��[la��e��B����?���d�Rx�NS����r�o����y:���O���s�{��gmT�?�Vu�M6�S�n���l��)���F-�BD]�Y���Sm95BR�(�e�n��F*�T�9c�)5br��V���p�n����>�]}��v�S�H|��G�B�
�O�n#u:T]�Qu��c���@q�I�GS(�l�@q�I�Gc��K�X�k.�b,��z �mb�!P���X7�����"PL%���b�G�����P^������Sj��"PLoN��
(V��X�6w�\Qx��W������N� �(C��k5
Oa�������:���
*dS�����S��Z*���G����O��4�jW��'���x�w���z�f���:M��5�\��)����x�����G-��^n��,��f�mF\#1T�S�&���n���|��I���?��b"PL����b�G��SI7�C����/����u(����C�S�u���]b?J�[xA�t�/�Hy�@1����X���>&�t�����5� ���@1�5-� P��J�D P�F�[1�b.M��Qys+'P���|�b��mUH��S��+'-��@��(J�(�-+���%K� ��l�@�%K���D�Q9���$(&anh�@���?�P�@�W�<������j�@qt� ��(����D���Hw�@1�uh�@1H��%PL�Mg:�lK�Y�t� �Y�����4��i��[!Pl��7�b|cZ�+@�����\�@�r��
K�� P��ZZ)�b)Qe+���"���*+L�Xe�(�MUiA�J9+����'�L�h��(�D��>&
kn���4Qg(F�mY9�bK�"�.ZY,* PL�(�7W����N����@1�u�%��F�����CK�A"�-�b:o:��Yg["P�j��O���:��>&H��%PL���
�b�H�� ��B\�����@�����VH�XJ� �b��J K�*+@�XeG(v�UYa��(���@�m�J(V�YX���<�f(F�-��@��'�L�1QXs+%P���:�@1*o�� [��` (v��bQ� �b���@1��Z$Pw�t�����-(f5��'PLgZ"Pio �y����:��bV#�}�t��%�1A"�-�b��V�E��M����
(���v* P����B�R�(���VJ�XJTY��(;��@�#��
(VF�vE�mSUZ�@�R�����/��6�@1ma���<Qf�����[)�b.M��Qy[VN�����H�@��V����o����"����(�s'PLg�m�@1���>�b:���b�H{K������t������(��-�� in �87�B��,�o����W�@1�/�#P� �b���(�E)@����R�R��
(VF�QE�qUV�@�2��+"Pl������rVFg!O����h+&P,��2�}L��J si�� P����r��,<�E�]��XT$@��~; PLo� G��@1�;�b:�lK�Y�t� �Y���D�[�t�t�����D���Hw�@1�uh�}L�HsK������f��(�7����q}���+'-��@��(J�(���(�UV�@�2��*"P�������Q�]�b�T�$P����2:�y��$P�F[X1�b!O���c���VJ��Ku�bT���(�d��. P�����" ����bzs�H�8:���� �Yg["P�j��O���:�D�$��(����?�u�%��F�����CK�c�D�[�4���(6����@1�1-� P��K�T.@�X9ii���DQ
(Fa-��@������QvT�bG\�&P��������� �b������_�m&�b4��� y��d�5�R�\��3�����@�%v� �b�, (�����E��q'PL�N���:��bV#�}�t��%� ���@1�7�����-(f5��'PLgZb$��(�qnn�@�Y$�����i!� �b\_jG�r��IK+$P,%�R�@1
ki���D� P�����;���0�be�mWD��6U� +�,����B�h3 ��VL�X�e&��(���(��D�A���e��-Yx���he��H�@1�v@���\-(��;�b:w�t������(��-(��������Og�m�@1���>�b:���� ���@1�ss+��"��&P�oLq��R;�(VNZZ!�b)Q��QXK+%P,%�� �be�UD��We� +�l�"���*-H�X)gaet��D�I����b�B�(3��Da���@1�&����-+'Pl���]$@��E+�EE@������j�@qt� ��(����D���Hw�@1�uh�@1H��%PL�Mg:�lK�Y�t� �Y�����4��i��[!Pl��7�b|cZ�+@�����\�@�r��
K�� P��ZZ)�b)Qe+���"���*+L�Xe�(�MUiA�J9+����'�L�h��(�D��>&
kn���4Qg(F�mY9�bK�"�.ZY,* PL�(�7W����N����@1�u�%��F�����CK�A"�-�b:o:��Yg["P�j��O���:��>&H��%PL���
�b�H�� ��B\�����@�����VH�XJ� �b��J K�*+@�XeG(v�UYa��(���@�m�J(V�YX���<�f(F�-��@��'�L�1QXs+%P���:�@1*o�� [��` (v��bQ� �b���@1��Z$Pw�t�����-(f5��'PLgZ"Pio �y����:��bV#�}�t��%�1A"�-�b��V�E��M����
(���I� @ IDAT�v* P����B�R�(���VJ�XJTY��(;��@�#��
(VF�vE�mSUZ�@�R�����/��6�@1ma���<Qf�����[)�b.M��Qy[VN�����H�@��V����o����"����(�s'PLg�m�@1���>�b:���b�H{K������t������(��-�� in �87�B��,�o����W�@1�/�#P� �b���(�E)@����R�R��
(VF�QE�qUV�@�2��+"Pl������rVFg!O����h+&P,��2�}L��J si�� P����r��,<�E�]��XT$@��~; PLo� G��@1�;�b:�lK�Y�t� �Y���D�[�t�t�����D���Hw�@1�uh�}L�HsK������f��(�7����q}���+'-��@��(J�(���(�UV�@�2��*"P�������Q�]�b�T�$P����2:�y��$P�F[X1�b!O���c���VJ��Ku�bT���(�d��. P�����" ����bzs�H�8:���� �Yg["P�j��O���:�D�$��(����?�u�%��F�����CK�c�D�[�4���(6����@1�1-� P��K�T.@�X9ii���DQ
(Fa-��@������QvT�bG\�&P��������� �b������_�m&�b4��� y��d�5�R�\��3�����@�%v� �b�, (�����E��q'PL�N���:��bV#�}�t��%� ���@1�7�����-(f5��'PLgZb$��(�qnn�@�Y$�����i!� �b\_jG�r��IK+$P,%�R�@1
ki���D� P�����;���0�be�mWD��6U� +�,����B�h3 ��VL�X�e&��(���(��D�A���e��-Yx���he��H�@1�v@���\-(��;�b:w�t������(��-(��������Og�m�@1���>�b:���� ���@1�ss+��"��&P�oLq��R;�(VNZZ!�b)Q��QXK+%P,%�� �be�UD��We� +�l�"���*-H�X)gaet��D�I����b�B�(3��Da���@1�&����-+'Pl���]$@��E+�EE@������j�@qt� ��(����D���Hw�@1�uh�@1H��%PL�Mg:�lK�Y�t� �Y�����4��i��[!Pl��7�b|cZ�+@�����\�@�r��
K�� P��ZZ)�b)Qe+���"���*+L�Xe�(�MUiA�J9+����'�L�h��(�D��>&
kn���4Qg(F�mY9�bK�"�.ZY,* PL�(�7W����N����@1�u�%��F�����CK�A"�-�b:o:��Yg["P�j��O���:��>&H��%PL���
�b�H�� ��B\�����@�����VH�XJ� �b��J K�*+@�XeG(v�UYa��(���@�m�J(V�YX���<�f(F�-��@��'�L�1QXs+%P���:�@1*o�� [��` (v��bQ� �b���@1��Z$Pw�t�����-(f5��'PLgZ"Pio �y����:��bV#�}�t��%�1A"�-�b��V�E��M����
(���v* P����B�R�(���VJ�XJTY��(;��@�#��
(VF�vE�mSUZ�@�R�����/��6�@1ma���<Qf�����[)�b.M��Qy[VN�����H�@��V����o����"����(�s'PLg�m�@1���>�b:���b�H{K������t������(��-�� in �87�B��,�o����W�@1�/�#P� �b���(�E)@����R�R��
(VF�QE�qUV�@�2��+"Pl������rVFg!O����h+&P,��2�}L��J si�� P����r��,<�E�]��X��%�������>#^�W'�Zk���Xb���{���;��C���rVQ��WH
m :����q}}}m=�B/�������������5�o)��6S�s������h
Qq���Y��� &�� ��1�l�j�/�c�����-'N�c��j��<���s��'�q�_�����>�L����[����&�-�hUSV���w�4i��j*]�Z�Yf��-��X|*��X\�����/�_��Wu��x�;���;z�@ @ @ @ @ �����/O�8�V�B�Z��&���tIW]u���?�pw����N8�m��F
�W[m5����7<�/�S������RK-��RC[�5�~���rS�Li�9Zx��z���;����
�/<ga
r��}4w�%�,,���d.{�fY~Z�+[�k����l���+��4J�)��N�>c�� �6mZ�i�t:H��J�����Q�(�Q�^�i�UWu������/���om�:s���S�6F����k��c�0�`�8��������Y������>/�p��[-h_���Zc������J�]��L�+@����XD��b��5���E��8:�\C1���4?����!��r��kx���5Gg������bzs��5��s}�t�����bV#�}����:��>&H����i��[���"������i!� �b\_jG�r��IK+$P,%�R�@1
ki���D� P�����;���0�be�mWD��6U� +�,����B�h3 ��VL�X�e&��(���(��D�A���e��-Yx���he��H�@1�v@���\-(��;�b:w�t������(��-(��������Og�m�@1���>�b:���� ���@1�ss+��"��&P�oLq��R;�(VNZZ!�b)Q��QXK+%P,%�� �be�UD��We� +�l�"���*-H�X)gaet��D�I����b�B�(3��Da���@1�&����-+'Pl���]$@��E+�EE@������j�@qt� ��(����D���Hw�@1�uh�@1H��%PL�Mg:�lK�Y�t� �Y�����4��i��[!Pl��7�b|cZ�+@�����\�@�r��
K�� P��ZZ)�b)Qe+���"���*+L�Xe�(�MUiA�J9+����'�L�h��(�D��>&
kn���4Qg(F�mY9�bK�"�.ZY,* PL�(�7W����N����@1�u�%��F�����CK�A"�-�b:o:��Yg["P�j��O���:��>&H��%PL���
�b�H�� ��B\�����@�����VH�XJ� �b��J K�*+@�XeG(v�UYa��(���@�m�J(V�YX���<�f(F�-��@��'�L�1QXs+%P���:�@1*o�� [��` (v��bQ� �b���@1��Z$Pw�t�����-(f5��'PLgZ"Pio �y����:��bV#�}�t��%�1A"�-�b��V�E��M����
(���v* P����B�R�(���VJ�XJTY��(;��@�#��
(VF�vE�mSUZ�@�R�����/��6�@1ma���<Qf�����[)�b.M��Qy[VN�����H�@��V����o����"����(�s'PLg�m�@1���>�b:���b�H{K������t������(��-�� in �87�B��,�o����W�@1�/�#P� �b���(�E)@����R�R��
(VF�QE�qUV�@�2��+"Pl������rVFg!O����h+&P,��2�}L��J si�� P����r��,<�E�]��XT$@��~; PLo� G��@1�;�b:�lK�Y�t� �Y���D�[�t�t�����D���Hw�@1�uh�}L�HsK������f��(�7����q}���+'-��@��(J�(���(�UV�@�2��*"P�������Q�]�b�T�$P����2:�y��$P�F[X1�b!O���c���VJ��Ku�bT���(�d��. P�����" ����bzs�H�8:���� �Yg["P�j��O���:�D�$��(����?�u�%��F�����CK�c�D�[�4���(6����@1�1-� P��K�T.@�X9ii���DQ
(Fa-��@������QvT�bG\�&P��������� �b������_�m&�b4��� y��d�5�R�\��3�����@�%v� �b�, (�����E��q'PL�N���:��bV#�}�t��%� ���@1�7�����-(f5��'PLgZb$��(�qnn�@�Y$�����i!� �b\_jG�r��IK+$P,%�R�@1
ki���D� P�����;���0�be�mWD��6U� +�,����B�h3 ��VL�X�e&��(���(��D�A���e��-Yx���he��H�@1�v@���\-(��;�b:w�t������(��-(��������Og�m�@1���>�b:���� ���@1�ss+��"��&P�oLq��R;�(VNZZ!�b)Q��QXK+%P,%�� �be�UD��We� +�l�"���*-H�X)gaet��D�I����b�B�(3��Da���@1�&����-+'Pl���]$@��E+�EE@������j�@qt� ��(����D���Hw�@1�uh�@1H��%PL�Mg:�lK�Y�t� �Y�����4��i��[!Pl��7�b|cZ�+@�����\�@�r��
K�� P��ZZ)�b)Qe+���"���*+L�Xe�(�MUiA�J9+����'�L�h��(�D��>&
kn���4Qg(F�mY9�bK�"�.ZY,* PL�(�7W����N����@1�u�%��F�����CK�A"�-�b:o:��Yg["P�j��O���:��>&H��%PL���
�b�H�� ��B\�����@�����VH�XJ� �b��J K�*+@�XeG(v�UYa��(���@�m�J(V�YX���<�f(F�-��@��'�L�1QXs+%P���:�@1*o�� [��` (v��bQ� �b���@1��Z$Pw�t�����-(f5��'PLgZ"Pio �y����:��bV#�}�t��%�1A"�-�b��V�E��M����
(���v* P����B�R�(���VJ�XJTY��(;��@�#��
(VF�vE�mSUZ�@�R�����/��6�@1ma���<Qf�����[)�b.M��Qy[VN�����H�@��V����o����"����(�s'PLg�m�@1���>�b:���b�H{K������t������(��-�� in �87�B��,�o����W�@1�/�#P� �b���(�E)@����R�R��
(VF�QE�qUV�@�2��+"Pl������rVFg!O����h+&P,��2�}L��J si�� P����r��,<�E�]��XT$@��~; PLo� G��@1�;�b:�lK�Y�t� �Y���D�[�t�t�����D���Hw�@1�uh�}L�HsK������f��(�7����q}���B�x��'��l����{z{\����w���F<�����,�z�O1��������2����y�b�7����b��q'PL�N���:��bV#���%P���78k�sCC�xC�np�l70cF~���������$��g�����>=6���,S����#H�<���������3�69���!740��?������^t���Mz`��n_tC������������7��z�|i���O|�����I�]��;��I�o�%]���_�1��o�}KL��������%������>���:��K��|��B��O'P�/��i�c��x*���V�S+@��
��(�� (.r��B�X �?�i;�����(����B�T����/�����V��q�A!d��;y���0�||]���7��J�|G�u���+-����������-o�&/:���a�e�.������'��f���]�8��e
O�v� ���$(&a�F���;�;��|�e�18K��s�w
7Lz��<�AI���=�������u�����4��epV�e�r�Nj7����y��{�����������/�?������A���s��}�>��~[?�iwR���;�;������"���&�\������6;��y�_G���_G�������C�y��a���;~��u�C������������+�q�3�4=i���@�l���
�O!G�����6��mg�����B�MC~��o��"~��C�v��5T�k��� ����j�?���?
e�'��l ��m0L
nt��7z{h�/�r}~{�:�?W��|���w?���I?B���c7�������O���
�'����@Qa���|�W��l;��f�V��:��=�5) ������O����������_{��}c�[m�~�}������mQ���m�y�S��r�Oe����q�\ �~���>CT�v��X��{��J����=�����k����~_�2mw}��r�3�������~u�����i>|�1����N�}N*h�|�������
�������=���#����~�8(��������f�����!���������m�z�]���}����Y�
M�G�/�m�v���������{�.�wk�>����(�p"�*���������5<i����7������\�q�l�%���?��Ny�A����t\1�,��6�`�^�I��O����xP?P�_j)�
�}���������P?����}����mg�)�}�T����f��}�|�����?�����}�m{h��cL=�c�V�Um����1��o7L�X���C����A����* P���
�+�s^�J���#���m���u ����WU����PXH3�;(|�ZY�j_���QR�:���������Z��u���|���u�4=��eu��$;���K��=:��_T&�|�k�i��_~�F.46^����_N�����LGd����c���%����/�����T���U��W���,���a!X���B/������r?8� �AU�4�@M����i��
�|���g���}z�$�Y����<�����/u�����_�!��-'�)laH����ES�qN d���Pn| �g�y���l�CCj/�y-�Smh�r�%[[���m�6���y�Lh��+l��h�~OO�a�*��#,�%}�.���L}>�����>��>�j�_�TT��y�G���}�n�/7�������m*S��L
��L�!��s��r�IWc�/2����q���5n
pj?(�m�C�fPw��������O���:^�/8nhu���O��B7=�����K�_���Bz���suRO����d���F��C~�j�(Mg���oUO�B�da�Tp?)���V���]Z��c=
>��d�x?�i���I���;�j�/�;��>p��T���������sl����~���?;|�:�TW=0��2�e�k�mw���kj;F����Q��>4�(�o��A�\�k�x�-�"����6���Z���C��l���m�W��l�\�Zm~{��R{|���9�����5�p�4
�?�m�Oe�I�;�|h�P��?���QE����}������=�N�y
��C$������_)��Nh����;�W��$�<�����A�>���~�|�_;�������V�������dZ��}p}�gKY{��}��q������)�G�����g}�7���m�@:���������q
����o������~���yR���hCy���R��i��j�������]S��l��^��S�g���~�[H���z��i�o
����~���Ma\�����
��4�(���]����d�nk��(L�Nt�m����A��<��~�1�-��O�#�����~����0����=������}p�p��b7��U_�c�A����4��/�k5�u�}��C��c?�?��a������H�O}�� �������M���&��l�?�vf�~�]�D�X�dg�,t���������J�e;�����Nz?��6�>|@?��{\���~^����@���}�M�~�{#o?�mP��vF����������6_?��}r��_�oKX]����k���~�c�0�X������I������?p����u7��/~���m�C�g���������E`�G �f�o�"P�o_�Xx�:����E��7���I_�Zu
�o;�h>j�ih��o 2b�/z������;u����@k�����eX���w��(l�����;�n�b_Pt �������#F&�_�.��rC�U���e�>}���'s�������@�FgX�����/�
��� �+s�J����CI�8�}�VG��N������wB9���u�Z02�~��/�����oVg��}�K��c�
� �r���_�F� ����?Oo���_�:(����:�����5�B ��C���;^����@�c�X������S����w�1� �-�T���~�����;?-����@@�~;����:�W�V'V��Y[����A��o��3e����Y���������������E9���[n��F������;����o���y�cQ��:�����W���k��������v���g
���Nu�U'��������u�W����KO�n��:F��PlT�}~d���8��46��w*�=T������e��=48���,��:�������ja��gzo���tV��x,���*C��}��I����}�-�/�yM��l�I���z�����dZ����y���:���� �{�yR��USq+6�C?�H.�wk�m�o/���P���������~Z�������m���n(�����������W��&��|&�_���_o|���bq[��*t����t_��W����`m?�}���y��f��@��������D���\���>��X%u�k;~iI�8:[R�����~�c?��PpN�G�����v�t��
���\��~T����w�{���������}c�Uc�����k���[��Ei�>�����U��k������}m�~:m�a\������:~���Lv�����6�k�����������D#����6"Y��
�k?�����e�{x����(���K�dE���(���nn�UW�Z���`��zcxn�}��;�Z�h���:���$bB@��0��������=�`('��h��_W����q�z������z�gOU�"�����d����a��lT����@T��:����[��I��E(;"��k���� �<����`��_�����ZZ���v��w
�S���{�:��M������uu�~��������S����zG�:���W�~�_7�#�&�=(��� ���zL3���������+f�S�����cJ��F�|A���>)x��5���E $��4��j�ph�����h�~v5h����62.���~I�0�������
��q�W���wB[��9�W�bS�o�6
4���J#�����+���,;��a�cu4�c8t�������4�@�����a���k�F���?�I
���Fm�1���0�:��Y���>l��_�8��q
��in���F���M� ���1�/F���yO��
)������3
g����k#l����L�ntZp[
����t���,�S����������;�{��
j��������t����@�n��\;f�p����������2K���E�B����k���������������?��T!��}j}�i�!����II�;��?5R����Q���������l�o�?���K/��H\�c��*F����c�M�)�m�������������&��Fu�����3����G��g�?�B�s��������t�kgq��[�u��5_?�>�i�����Q��������?�~6�
�������5O�����C�M�6�(���"�~������#���R �n�7�?��&�����/��������0�'�T�qu&���mL�����|�@�Tk���&y����mv���j����O���W�B��t0��O�P��,~��'��q�������_�6?�Y�Y��R���5u05�����k:�N�ef��U9�\���>���:���� 3��VN����a�����eQ�>�NV(�o}G���H�e�I_�j����/���������%@�;8���}����\��e�^�M����q�J\���_�/
�������{�>�������p�����G�s���i��y�}-�������_z�o�BBFv��[m���2�Ax�m��/�Fl��X�mJ_�[N�9��udGL�>�y�jRg�_��?@��}��u�f�'�U�/>
�I������!L����]�������'�*L�����t-�0����D����1���[lmL�N�a}���z����c�[u
�s!�G��vz�I��a�>�:�2��V������mb �#��V�=<I�i[l���2Z>y��p9��������S}~[�����yt�]���2/��S����*F<��\����mB�nj�5!P����>{l���F�dFQ��������<������
�����u�����v,���Km�@
����@a9�[�y��&l?��e�8���I�>���)��~�
���1�a�
�d��������9|��,��m\#G�d���d�o��o���:�k�4
��P���d�����S
��]g�m�o��V?��mS�9����_���D���c��h��?c���u������l;
�1a�S��;�7�����C�?j����M�����j���������1�BI�Su�k}���x�������fm;���LW�������as��|��B���kp��B���0�>V���BB}6h������)���:���1�4�D�mZ�z�~���-�/�� �
�\�������:�m���:��{}�nA��j���������i����[6��cP�S�c���t��F]6>����A��w�������E�'j?;�o��U�����/�S��r�e���������PV��������s�ok���������V����`k?���h?����{�w��X�[�������D�?������F���wb���o������}��x������6�\�U_W�D�c�etL�m���)A
�4����F{��)��~��/l����bT���/k�[�����u����J�{��c�����1�}�����iw��M?z�� �^��ZA�>/�9���N����6��}�������>����N��W3�><V����?�d��|^�B(.OE`4B���/^�6|��|��zL:��� �1��i\���X^�B���/:q|�A�/�&����U�n{�x�n�3�!������v�`�}�]�%����� �oc�qpS@B�Z�����@Q�e����E9���_@}�|����V�M���5_�u�X��yX"��9���u�5�����zm�������V����8�R'�F�6���B�:�l����em>��A��X(Db�#���%f���Nc�/T��[���6&2�Wh�_����d-x������:P�ix[����h�Qp�_�����s�6�/�|�~9l�Q��mD���.��4&u�������}���6e�5��C!�X��v��;�0��_�����ZP7�y�N����!�Dl���B�N2g&���skg��V��:��4qz=~�?0��(W�e�KE}Qo9����#�_^��e�e����kA��������6�0Y}��X_����z�y}C��0��U5�fe��(f�N�ce���'x��n����A����W6V��/u����j?Jc�������~}�_�h}���s���=�F�z�>;�Uj��g{�o�&�rr����do�l��R�|�\���F1�_���s�����>�o=~���_oy�������w>�Xq���@�����j������j��=����I��n�
e+����}������N�];
w�6�W�m^��3���:f�im�mO�g��7��@�s|��~���~L��QS)�zt��74��x�i��b�����������~T������
��k�������B�1�F.�����?�G�����#�V/�_�>�=�
���I������<;�=��Y'���w_�>w��e���;�;y�f���>^�b}[S�����vX;�Bm�o���G��������6�p"l�s���� N��m�F�|Y]���@�87�-8����s_���o������=����~�?����mUY�������)Hi����k�(�]��h�6���<�/���M�q�<�M����<�WXW�����]�L������;I���Bp�m���w�qg�t��
�`��)�B�{���@*��$��BB���Fz� �H�w�mz3���b�M1��%����oV+�NWv�vFw�o>��vg�������|�y����5� <o�F]�6��,���~����) ���*���������hn2hc�/�k��^��a�s�S�s����f��#?N2���q�o��y��}�O=<�E��q�Q}� +.��? E���1��_��s��=�E_c �~�����w�je~����e�{�����``���u�z��e �������j��6,��MT����)z��}fSV��=`�����yK����O��e=_^���k��{(n���t �%�K����5��U#��\x�����h�)��)���4�����&���v9�%%�=F��%���������_+~F|��K���F7�����&������+1�����k���n������o:e���� �S��5/�<�a�N���Am�c��A�X�>���s�����E� ���~����y����^�
��i3����y�9�6���5c�^�%��� �����l}�������m#�s�EN���]���v��40�������_3/�5 ����J�m��
<�G��#�����y1��s��F56�O���(����q��!�����Q��4���q";?'�F�=�}sM���{��j�y����u�hw�� �>��s1����d��Q�5�{cj���TG{��l7��=hYo�n�z����:����5�37�6��h�3�� ��=�ej�Ju�t��\i�/�6�9,wA_�o\&P���_�
�@q�S���3�u(��j���X7���C��������|5U��X���{������c *=@f��k*IR
����]���
��Nj�.b�.�@���A�����~��{���z�j�^&S�LI5nY{��N R������>���d��&�cZ'�� �Y��Q���c��i�)bR;�� �
�}]�l�f�<,��`9�[������w���p���C7l���'O������s`;ToT����>�D�i��@�2���$��a�����l����Q �AhL����
�����p��p�@+�p��v� n���qoBzu��,�`m���P��������O�h�>����_\�}�1����/�}��s�zPaJ�j���o��� (�'��^��eRR�T����mx03�)&�O���)��-":mg�������� +�/h���E�x�
Y��$�1��/a0����F���S`��GQ�Y:f1������'DI�����F)��`&�dJx��PX���0����U�V���_&N���k��&�8��k�h��� �q�FI�nq��:��`�-PD?���a���/�S`��|�\��}���Kx9�Zk�%��p�g��r���ZO��� ����~_F5K��J��1��{���O*������$�]�h ��j �� L��W�~b T��oF�����SB_nK�� �������~m5�m���U5^��s��C��6���J3x�������]�j �Mmb��@^'�=i�b�yq��k]��G����$S.�{��`p;_
�N��l�T-���f�~�=��������e����x��
�]w���6o��n�|v�S�����T��G��|5�����M��!�<xg�� ���7}P�g�1`��P���r}������r�M�4���AAuZ ��W��q����~���M�
�9�wq�2���@��p���o��W X�<P�7H�}B�R���&��[�I�u������F�v���DjT��3�����i �������_���S����So�z]6���M�zLI����Y�������p^S!�iO�-���U 8�r^3Eq1��FH�l2�p���m���Qj�D/����3�g\�d��k�l����M����O����>j��a1�~m^TRXe��q
ixL��@"�5�L 1 H\_�����v��~��^��o�F��ET�T����0���o���0���G�����j]g<uY��^r���V����k��~T��=P��]p����Es��xyF�M&T���8t�>q��(r�Ym���To�H����3��]��^0��������G�6}"x�j�)��G�����-�D��S��7���������v�G�8���@L��Pp��}
�-�se^T1��>�_�53�6c�
��5� ��C�v��JM����M��$���_�z����~<x�D;�� C� �n|�Sk�2� �b ���
�� ��9�����f$D(>D*���_�7��
����O�a +}
����=61�� ��V�M3���Y��M�<��B���xp8�c�v@z5���Nv !�!�Q�a�ce��������z�!_�2��)�Uo���_�t���v��P�����v�
[����Vt�=��e�u[���F��"V�p�� g�7�����72�����N�b��{ 2���$��O}F6o��#:d��A�0���b�$��7��{<���?}�O<�� x8 ��Mb��4/�EH7$����3Eu�Wm��@�_N������p��C�+D���P@
���_���������y1��r�:��3����T��S�A�>��A�6�q-����7�� i]�����4�Q�`��3�N���n��T�
dP /���2d3������d K��wk?����}�����K ��&i��[��}���n $�%�n��$��]g� �����(�M�AE�S7<��o�{��'��m��F�S��sd��B�������@j���f� ����G@I}0�D������_���S�Ux����e3���V����x�-�<���e�]�?0��4E&OB�U���~�+`a5B���Z� =��%4d��7�� ~��f�F�w�n7� �M�"����� ���&�G��H>a�=C�i��B<,��
����������Fj{��0��VC��r 4�J�H�N��4���~������W��2�r }67��hK�.�����7� %�f54i;������#���/��Z�1
���6�.�ua��7^��������yC����fkc��U���:��@�Z_V�D�m���p�h�5Z >�
n�w=v����J�`lZ���a�Z}x9d���f���n;=%z^"$�D74��7^��-5��b����L���~���,\�� P$3Q��Q���Z����#����`�;����9 � �� N�����rEy��8���XW�V��[��?��������|U��L;��x��G*����_C�J���sd�<���)�w��I��HB�&���]��Q���5��r�� �f�F3��yS�Lz|�
����e��R�C����J]��)\�$/�RF:u�F�P`P�����?�6�`��N 5� a:�x�7\-����dFraF������z���]�jo9� �aK� � �g^��[��vg���f������[A�0��?���ITi�C��z��z�i#�lXFj�O����[������ @ IDAT ����4F�"������G� �T�
��Y�d�YU�<��(����~��o��2��d���'����p������}4�gh ���r?�dx1
����`���i����a@}����=�
����pJ��-���0pl������L�
����2�u�_+W�W�}Y3`�rx�h��Q�0�8��&\�`�P���Z����M�2e���3�E�
�"P����.��S,u����=��
���G_NN�Y+���R��*}�7�A�-E�j@�iG��TXg�SmK�g�T�W�����y�����m�K�l�M;�AA������ux�C�w�b^�����uY��PP���k�H� ���pp���=z��0i����})��j�������P�t��w*P�
�@���$o��78���p���yWo�c0==a�D
�0Y�������L�������D�g��a������.S��}�w*P� �����L�7�e@��Ah�W�/�������W�l��z/o���TjX`���|G\;M([}��f�.�P2�u!���������03 �f�`o�����I�����'~���xI����!k <���:8�
��3s���Ok0�S�>]�7�\!I���6c�z��'��(���r�iBX�`����'�"!�!��^5���X���|��u
�{x�%�������O��]a���Pl^^c|��3�7�1\Wa���������r�z�R~C����&�w��%���U��R�e��2�
��r��
�q��3J��9��g��*�pt��U��~;�e�<�:����9^ ��r��u��K��6�g�������ph�3%�5��@�O3�7I����(h��k=SY�� �/�������d���ynGhV�c�������'1�;�C��� YMD�|;�c��:��kZ��}�?�����>X+
����~��`�V��m�����|lm1/�(��q����%5� @%<&�E��@ss�I0�~ ����3a>��P����Ox{o?kr��,���n�Po@�E�v�Z3������-f+T�B_�T���GV�%�4�uKK��
b�� �v��8��1�n��?x�=x1��������!j��j�����_�Q;$&4/��X���y�W���D�<7y5O��������Mxv������92��u�3�����B}����K!
�����d�3OdB_��<*(D{�A�5��d_��{�it�.inn&P�.S�[c���FA�Q�UR�B���]w�%���o(S�'����0�B8���;<}�M� �&�z``�z(a}R�NO�O�k(m� !PG�jS����nH�����[����v���yXBg5Ul7�.�C����=���i�"�X0��f��Z,� zV�E��T��8�1���?�\��|� hS�p��(g�&�uP �k=�>3P�T�(��
I_1��Os]����V��m!d��6;�_M��@\o��J�:���� ���]x"�'�q�������/��j�6���.���=M�]�<0f����gi��n�m��� Y`�BSSs��S��Z.WMZ���\S�a4���?eS�E\w���L�l�H�(S�e�8L�u�:p=�*����!�����]��:9p?�U6�6@�cS������} s�e�c/=��>�;��-�nz����=����mh�p���kD�`+&6���:Xp5q�u�~������I_� t q
3�yq
FI�S#9�K#�! ��fm�3%s��v6w;������ymo&[��pM��uK���1������</'���YR_����^�S ��� )c�.SA5��Y�d`�:���D����9[{�j�O$+u�J�}�BX�e�:DX���G��� s�����b�d�K�;������>��I��6���:e����{nS)'�]/��|���x�G�V��F�x��7G=�9����@���YL��6&�����\@1���������zsC���b/���E������xD���,E(u��/r�G; ����|���S7Q���>���{�&�3`���Q�<�����5M�m��u��F���E\�z�!�(�,���1�H�� �������@x
��k+���kt0-}�Q�M���
1�@k��Rf?N�5��Ux�?�a80 g�u�^���j����o������bp �r��e���\�c��Wd��|������6����_��~>/���� Hx���qc��e�?��b�}8�f�������U3F������,�>1�*��~�z� �>��f�Wat��XF�F����[��3%lo�]�� �eW�V/U�+���^�f~���L@���o�6����T�(��O�>g��X��8��� 4�B�N��=cliQ�t�S=���>��������+���{iP�Q��`(�,;
�- ����XL ��=�!�A##�Y����G���i��o(�'�C�cV��a�@1�X1ee���� ���
!�{��1�����UOH��I���t�A��?�C0P`��m���� �� � �H�B"Pt��k��U�@1^=i�
XW`�E�g��[�a�Z�J�A�L�>=C��KP O#
Y��j
���1�=J�j��P�����`��m���; Gx�����T�c�r�z��)}@2��TH���yaq���*����' ���<�5e����L��S�b0���z��~`@�@=]Fyxx�VzPob#`�z���}��T7
�@��r�.cP8�^�N���.���@qH�K�.���*%��^{5����@��wz��[���{�7Nl�9j�6G7X���t�@�����(J�l�m�3�ME�����F��^E����x P�WOZ�� P�.��
GH�d��u��U�b���H�C!�xo���d������H�I�M@N�N�\���`!y� tm��%�D~l�L*�:��1X�����a�~�R��` B�"T�7g��������_��A�0P@a3y�y9K�?����C����@�����5G��(�/���N��Z��~���
��U&D����'�m���6�bvmln!P��nf�lc2�bk-��-es�%P������6T�M�
(�T�uQ� P�A��&C
SvW@1��
d�]R�"����j4���:UC�BM�a>x�u�����:���
�C�������)����������{j���� ����k���fZ�v���w�z�'HcC�����P0^����^�5���N���z��_����A�hC��6 �kw��
fo<