Fix race condition in SSI when reading PredXact->SxactGlobalXmin
DropAllPredicateLocksFromTable, PredicateLockPageSplit,
and CheckTableForSerializableConflictIn all assume that if
PredXact->SxactGlobalXmin is invalid there are no active serializable
transactions and that they can return early as there's no work to do. They
don't acquire a LWLock on SerializableXactHashLock because they already
have locks on the relevant buffer page or relation, which should protect
them from conflicts with new concurrent transactions. However, this
justification doesn't protect against concurrent updates of SxactGlobalXmin
which SetNewSxactGlobalXmin can do.
SetNewSxactGlobalXmin sets SxactGlobalXmin to InvalidTransactionId then
iterates over the active serializable transactions searching for a new
xmin.
static void
SetNewSxactGlobalXmin(void)
{
...
PredXact->SxactGlobalXmin = InvalidTransactionId;
PredXact->SxactGlobalXminCount = 0;
dlist_foreach(iter, &PredXact->activeList)
{
// find new xmin
}
...
}
If SetNewSxactGlobalXmin is interrupted before a new xmin has been set,
processes that read SxactGlobalXmin during this time will see
InvalidTransactionId and might incorrectly conclude there are no concurrent
serializable transactions. In the case of PredicateLockPageSplit, it will
return early and fail to transfer over information about SIRead locks on
the old page as a result.
I have a small repo that demonstrates serialization anomalies when
PredicateLockPageSplit is incorrectly skipped (they will show up much more
frequently with a small sleep after SetNewSxactGlobalXmin sets
SxactGlobalXmin to invalid, but they still show up eventually on normal
builds). [1]https://github.com/joshcurtis/ssi_race_condition_reproduction
I see a couple of options to fix this
1. Guard the reads with a lock. I've attached a patch to do this.
2. Update SetNewSxactGlobalXmin so concurrent lockless reads
of SxactGlobalXmin will never see an invalid transaction ID unless there
genuinely are no more active serializable transactions. To do this, delay
assigning PredXact->SxactGlobalXmin until all active transactions have been
examined.
static void
SetNewSxactGlobalXmin(void)
{
...
Assert(TransactionIdIsValid(PredXact->SxactGlobalXmin))
TransactionId xmin = InvalidTransactionId;
....
dlist_foreach(iter, &PredXact->activeList)
{
// iterate over transactions and find new xmin
}
PredXact->SxactGlobalXmin = xmin;
....
}
This is definitely a bit more complex. It requires that
SetNewSxactGlobalXmin is never called when SxactGlobalXmin is invalid to
prevent readers from seeing an invalid transaction ID when they should see
a valid one -- I think this is the case now since before
SetNewSxactGlobalXmin is called postgres checks that
PredXact->SxactGlobalXminCount > 0. I assume this entails that
SxactGlobalXmin is valid, but I have not checked every place the two
variables are modified.
I imagine the vast majority of postgres instances aren't using serializable
isolation, so I figured I'd throw this out as a possible optimization. I'm
not sure how the project trades off between simplicity and (very minor?)
performance optimizations, so I'll defer to the community.
Best,
Josh Curtis
[1]: https://github.com/joshcurtis/ssi_race_condition_reproduction
Attachments:
v1-0001-Fix-race-condition-when-reading-PredXact-SxactGlo.patchapplication/octet-stream; name=v1-0001-Fix-race-condition-when-reading-PredXact-SxactGlo.patchDownload
From a05d9741c3161a894decd1b88e148c4bbac60b0e Mon Sep 17 00:00:00 2001
From: Josh Curtis <jcurtis825@gmail.com>
Date: Mon, 1 Sep 2025 21:07:52 -0700
Subject: [PATCH v1] Fix race condition when reading
`PredXact->SxactGlobalXmin`
`DropAllPredicateLocksFromTable`, `PredicateLockPageSplit`, and
`CheckTableForSerializableConflictIn` all assume that
`!TransactionIdIsValid(PredXact->SxactGlobalXmin)` implies there are
no active serializable transactions.
This assumption is not true due to a race with
`SetNewSxactGlobalXmin`, which first sets `PredXact->SxactGlobalXmin`
to `InvalidTransactionId` then iterates over the active serializable
transactions to find the new xmin. If `SetNewSxactGlobalXmin` is
interrupted before setting a new xmin and other processes check
`!TransactionIdIsValid(PredXact->SxactGlobalXmin)` without taking a
lock during this time, they will incorrectly assume it is safe to
proceed as if there are no concurrent serializable transactions.
---
src/backend/storage/lmgr/predicate.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index c1d8511ad17..50e9f62b566 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -2949,12 +2949,14 @@ DropAllPredicateLocksFromTable(Relation relation, bool transfer)
/*
* Bail out quickly if there are no serializable transactions running.
- * It's safe to check this without taking locks because the caller is
- * holding an ACCESS EXCLUSIVE lock on the relation. No new locks which
- * would matter here can be acquired while that is held.
*/
+ LWLockAcquire(SerializableXactHashLock, LW_SHARED);
if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ {
+ LWLockRelease(SerializableXactHashLock);
return;
+ }
+ LWLockRelease(SerializableXactHashLock);
if (!PredicateLockingNeededForRelation(relation))
return;
@@ -3150,16 +3152,14 @@ PredicateLockPageSplit(Relation relation, BlockNumber oldblkno,
/*
* Bail out quickly if there are no serializable transactions running.
- *
- * It's safe to do this check without taking any additional locks. Even if
- * a serializable transaction starts concurrently, we know it can't take
- * any SIREAD locks on the page being split because the caller is holding
- * the associated buffer page lock. Memory reordering isn't an issue; the
- * memory barrier in the LWLock acquisition guarantees that this read
- * occurs while the buffer page lock is held.
*/
+ LWLockAcquire(SerializableXactHashLock, LW_SHARED);
if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ {
+ LWLockRelease(SerializableXactHashLock);
return;
+ }
+ LWLockRelease(SerializableXactHashLock);
if (!PredicateLockingNeededForRelation(relation))
return;
@@ -4426,12 +4426,14 @@ CheckTableForSerializableConflictIn(Relation relation)
/*
* Bail out quickly if there are no serializable transactions running.
- * It's safe to check this without taking locks because the caller is
- * holding an ACCESS EXCLUSIVE lock on the relation. No new locks which
- * would matter here can be acquired while that is held.
*/
+ LWLockAcquire(SerializableXactHashLock, LW_SHARED);
if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
+ {
+ LWLockRelease(SerializableXactHashLock);
return;
+ }
+ LWLockRelease(SerializableXactHashLock);
if (!SerializationNeededForWrite(relation))
return;
--
2.49.0
Hello!
Haven't checked the patch yet, but I'd recommend to add .spec reproducer
with injection_points to simulate the race in reproducible way.
Best regards,
Mikhail.
To prevent it being lost forever - I registered commitfest entry for it:
Hello!
Josh Curtis <jcurtis825@gmail.com>:
This is definitely a bit more complex. It requires that SetNewSxactGlobalXmin is never called when SxactGlobalXmin is invalid to prevent readers from seeing an invalid transaction ID when they should see a valid one -- I think this is the case now since before SetNewSxactGlobalXmin is called postgres checks that PredXact->SxactGlobalXminCount > 0. I assume this entails that SxactGlobalXmin is valid, but I have not checked every place the two variables are modified.
Such logic feels fragile to me. Maybe add a special flag like
'PredXact->SxactGlobalSkipAllowed' which will be updated to 'true' by
SetNewSxactGlobalXmin, and dropped to 'false' by functions affecting
PredXact->activeList?
But I prefer the first solution anyway.
Best regards,
Mikhail.
Thanks for taking a look.
I tried adding a reproduction with injection points, but had some trouble
with them. I'll give it another go this weekend.
I guess one of the commitfest entries should be deleted? I had already made
one already. I don't see a way to do that though.
https://commitfest.postgresql.org/patch/6037/
Josh
On Mon, Oct 20, 2025 at 5:55 PM Mihail Nikalayeu <mihailnikalayeu@gmail.com>
wrote:
Show quoted text
Hello!
Josh Curtis <jcurtis825@gmail.com>:
This is definitely a bit more complex. It requires that
SetNewSxactGlobalXmin is never called when SxactGlobalXmin is invalid to
prevent readers from seeing an invalid transaction ID when they should see
a valid one -- I think this is the case now since before
SetNewSxactGlobalXmin is called postgres checks that
PredXact->SxactGlobalXminCount > 0. I assume this entails that
SxactGlobalXmin is valid, but I have not checked every place the two
variables are modified.Such logic feels fragile to me. Maybe add a special flag like
'PredXact->SxactGlobalSkipAllowed' which will be updated to 'true' by
SetNewSxactGlobalXmin, and dropped to 'false' by functions affecting
PredXact->activeList?
But I prefer the first solution anyway.Best regards,
Mikhail.