Experimenting with transactional memory for SERIALIZABLE
Hello hackers,
Here's a *highly* experimental patch set that tries to skip the LWLock
protocol in predicate.c and use HTM[1]https://en.wikipedia.org/wiki/Transactional_memory instead. HTM is itself a sort
of hardware-level implementation of SSI for shared memory. My
thinking was that if your workload already suits the optimistic nature
of SSI, perhaps it could make sense to go all-in and remove the rather
complicated pessimistic locking it's built on top of. It falls back
to an LWLock-based path at compile time if you don't build with
--enable-htm, or at runtime if a startup test discovered that your CPU
doesn't have the Intel TSX instruction set (microarchitectures older
than Skylake, and some mobile and low power variants of current ones),
or if a hardware transaction is aborted for various reasons.
The good news is that it seems to produce correct results in simple
tests (well, some lock-held-by-me assertions can fail in an
--enable-cassert build, that's trivial to fix). The bad news is that
it doesn't perform very well yet, and I think the reason for that is
that there are some inherently serial parts of the current design that
cause frequent conflicts. In particular, the
FinishedSerializableTransactions list, protected by
SerializableFinishedListLock, produces a stream of conflicts, and
falls back to the traditional behaviour which involves long lock wait
queues and thereby more HTM conflicts. I think we probably need a
more concurrent way to release SSI transactions, entirely independent
of this HTM experiment. There's another point of serialisation at
snapshot acquisition time, which may be less avoidable; I don't know.
For much of the code that runs between snapshot acquisition and
transaction release, we really only care about touching memory
directly related to the SQL objects we touch in our SQL transaction,
and the other SQL transactions which have also touched them. The
question is whether it's possible to get to a situation where
non-overlapping read/write sets at the SQL level don't cause conflicts
at the memory level and everything goes faster, or whether the SSI
algorithm is somehow inherently unsuitable for running on top of, erm,
SSI-like technology. It seems like a potentially interesting research
project.
Here's my one paragraph introduction to HTM programming: Using the
wrapper macros from my 0001 patch, you call pg_htm_begin(), and if
that returns true you're in a memory transaction and should eventually
call pg_htm_commit() or pg_htm_abort(), and if it returns false your
transaction has aborted and you need to fall back to some other
strategy. (Retrying is also an option, but the reason codes are
complicated, and progress is not guaranteed, so introductions to the
topic often advise going straight to a fallback.) No other thread is
allowed to see your changes to memory until you commit, and if you
abort (explicitly, due to lack of cache for uncommitted changes, due
to a serialisation conflict, or due to other internal details possibly
known only to Intel), all queued changes to memory are abandoned, and
control returns at pg_htm_begin(), a bit like the way setjmp() returns
non-locally when you call longjmp(). There are plenty of sources to
read about this stuff in detail, but for a very gentle introduction I
recommend Maurice Herlihy's 2-part talk[2]https://www.youtube.com/watch?v=S3Fx-7avfs4[3]https://www.youtube.com/watch?v=94ieceVxSHs (the inventor of this
stuff at DEC in the early 90s), despite some strange claims he makes
about database hackers.
In theory this should work on POWER and future ARM systems too, with a
bit more work, but I haven't looked into that. There are doubtless
many other applications for this type of technology within PostgreSQL.
Perhaps some more fruitful.
[1]: https://en.wikipedia.org/wiki/Transactional_memory
[2]: https://www.youtube.com/watch?v=S3Fx-7avfs4
[3]: https://www.youtube.com/watch?v=94ieceVxSHs
Attachments:
0001-Add-infrastruction-for-hardware-transactional-memory.patchtext/x-patch; charset=US-ASCII; name=0001-Add-infrastruction-for-hardware-transactional-memory.patchDownload
From c80a75a51ae4dd5a67ac801deefe61fdd112279a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 20 Feb 2020 11:31:45 +1300
Subject: [PATCH 1/2] Add infrastruction for hardware transactional memory.
If --enable-htm is provided to configure, add support for
using the Intel TSX RTM facilities. Provide a very simple
abstraction that tries once but requires the caller to
provide a fallback. In later patches, it should be
possible to map this to the builtins required for other
ISAs including ARM TME and POWER8.
Perform a runtime test at startup to check if the chip
we're running on has those instructions (eg Skylake and
later server-class chips; many mobile/laptop chips don't
have them, unfortunately).
This facility will be used by later patches to perform
hardware memory transactions, to replace LWLocks and
spinlocks optimistically, with a fallback. If compiled
without --enable-htm, a constant value is provide so that
the HTM path can be removed by constant folding leaving
only the fallback path.
WORK IN PROGRESS: solution looking for a problem
Author: Thomas Munro
---
configure | 40 +++++++++++++++
configure.in | 15 ++++++
src/backend/port/Makefile | 1 +
src/backend/port/htm.c | 77 +++++++++++++++++++++++++++++
src/backend/postmaster/postmaster.c | 5 ++
src/include/pg_config.h.in | 3 ++
src/include/port/htm.h | 60 ++++++++++++++++++++++
7 files changed, 201 insertions(+)
create mode 100644 src/backend/port/htm.c
create mode 100644 src/include/port/htm.h
diff --git a/configure b/configure
index 37aa82dcd4..e44bceb5b0 100755
--- a/configure
+++ b/configure
@@ -829,6 +829,7 @@ with_pgport
enable_rpath
enable_spinlocks
enable_atomics
+enable_htm
enable_debug
enable_profiling
enable_coverage
@@ -1514,6 +1515,7 @@ Optional Features:
executables
--disable-spinlocks do not use spinlocks
--disable-atomics do not use atomic operations
+ --enable-htm use hardware transactional memory operations
--enable-debug build with debugging symbols (-g)
--enable-profiling build with profiling enabled
--enable-coverage build with coverage testing instrumentation
@@ -3279,6 +3281,33 @@ fi
+#
+# Hardware transactional memory
+#
+
+
+# Check whether --enable-htm was given.
+if test "${enable_htm+set}" = set; then :
+ enableval=$enable_htm;
+ case $enableval in
+ yes)
+ :
+ ;;
+ no)
+ :
+ ;;
+ *)
+ as_fn_error $? "no argument expected for --enable-htm option" "$LINENO" 5
+ ;;
+ esac
+
+else
+ enable_htm=no
+
+fi
+
+
+
#
# --enable-debug adds -g to compiler flags
#
@@ -6885,6 +6914,11 @@ $as_echo "#define PROFILE_PID_DIR 1" >>confdefs.h
fi
fi
+# enable HTM if --enable-htm
+if test "$enable_htm" = yes; then
+ CFLAGS="$CFLAGS -mrtm"
+fi
+
# We already have this in Makefile.win32, but configure needs it too
if test "$PORTNAME" = "win32"; then
CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32 -DEXEC_BACKEND"
@@ -11819,6 +11853,12 @@ $as_echo "$as_me: WARNING:
*** Not using atomic operations will cause poor performance." >&2;}
fi
+if test "$enable_htm" = yes; then
+
+$as_echo "#define HAVE_HTM 1" >>confdefs.h
+
+fi
+
if test "$with_gssapi" = yes ; then
if test "$PORTNAME" != "win32"; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing gss_init_sec_context" >&5
diff --git a/configure.in b/configure.in
index 8adb409558..a1365d0986 100644
--- a/configure.in
+++ b/configure.in
@@ -193,6 +193,12 @@ PGAC_ARG_BOOL(enable, spinlocks, yes,
PGAC_ARG_BOOL(enable, atomics, yes,
[do not use atomic operations])
+#
+# Hardware transactional memory
+#
+PGAC_ARG_BOOL(enable, htm, no,
+ [use hardware transactional memory operations])
+
#
# --enable-debug adds -g to compiler flags
#
@@ -594,6 +600,11 @@ if test "$enable_profiling" = yes && test "$ac_cv_prog_cc_g" = yes; then
fi
fi
+# enable HTM if --enable-htm
+if test "$enable_htm" = yes; then
+ CFLAGS="$CFLAGS -mrtm"
+fi
+
# We already have this in Makefile.win32, but configure needs it too
if test "$PORTNAME" = "win32"; then
CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32 -DEXEC_BACKEND"
@@ -1169,6 +1180,10 @@ else
*** Not using atomic operations will cause poor performance.])
fi
+if test "$enable_htm" = yes; then
+ AC_DEFINE(HAVE_HTM, 1, [Define to 1 if you want to use HTM.])
+fi
+
if test "$with_gssapi" = yes ; then
if test "$PORTNAME" != "win32"; then
AC_SEARCH_LIBS(gss_init_sec_context, [gssapi_krb5 gss 'gssapi -lkrb5 -lcrypto'], [],
diff --git a/src/backend/port/Makefile b/src/backend/port/Makefile
index 2d00b4f05a..d81b03d196 100644
--- a/src/backend/port/Makefile
+++ b/src/backend/port/Makefile
@@ -24,6 +24,7 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(TAS) \
atomics.o \
+ htm.o \
pg_sema.o \
pg_shmem.o
diff --git a/src/backend/port/htm.c b/src/backend/port/htm.c
new file mode 100644
index 0000000000..aae14ee272
--- /dev/null
+++ b/src/backend/port/htm.c
@@ -0,0 +1,77 @@
+/*-------------------------------------------------------------------------
+ *
+ * htm.c
+ * Code to decide whether HTM is available on this micro-architecture.
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/port/htm.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <setjmp.h>
+#include <signal.h>
+
+#include "port/htm.h"
+
+#ifdef HAVE_HTM
+
+/* Global variable to advertise whether HTM is available. */
+bool have_htm_support;
+
+static sigjmp_buf illegal_instruction_jump;
+
+static void
+illegal_instruction_handler(SIGNAL_ARGS)
+{
+ siglongjmp(illegal_instruction_jump, 1);
+}
+
+static bool
+test_memory_transaction(void)
+{
+ /*
+ * We don't really care if this transaction commits or aborts, we just want
+ * to exercise the instructions and trigger a SIGILL if they aren't there.
+ */
+ if (!pg_htm_begin())
+ return false;
+ pg_htm_commit();
+ return true;
+}
+
+/*
+ * Test whether we have HTM support in every backend process.
+ */
+void
+htm_init(void)
+{
+ /*
+ * You could use the Intel CPUID feature test for this, but perhaps a
+ * SIGILL-based approach will eventually work on other ISAs that grow HTM
+ * support.
+ *
+ * TODO: Is this going to work on Windows/MSVC?
+ */
+ pqsignal(SIGILL, illegal_instruction_handler);
+ if (sigsetjmp(illegal_instruction_jump, 1) == 0)
+ {
+ /* Try to use HTM instructions */
+ test_memory_transaction();
+ have_htm_support = true;
+ }
+ else
+ {
+ /* We got the SIGILL trap */
+ have_htm_support = false;
+ }
+ pqsignal(SIGILL, SIG_DFL);
+}
+
+#endif
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b3986bee75..e3fabaeeb7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -109,6 +109,7 @@
#include "pg_getopt.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
+#include "port/htm.h"
#include "postmaster/autovacuum.h"
#include "postmaster/bgworker_internals.h"
#include "postmaster/fork_process.h"
@@ -2621,6 +2622,10 @@ InitProcessGlobals(void)
((uint64) MyStartTimestamp >> 20);
}
srandom(rseed);
+
+#ifdef HAVE_HTM
+ htm_init();
+#endif
}
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 60dcf42974..287165c595 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -295,6 +295,9 @@
/* Define to 1 if you have the `history_truncate_file' function. */
#undef HAVE_HISTORY_TRUNCATE_FILE
+/* Define to 1 if you want to use HTM. */
+#undef HAVE_HTM
+
/* Define to 1 if you have the <ieeefp.h> header file. */
#undef HAVE_IEEEFP_H
diff --git a/src/include/port/htm.h b/src/include/port/htm.h
new file mode 100644
index 0000000000..7a1bc199ea
--- /dev/null
+++ b/src/include/port/htm.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ *
+ * htm.h
+ * Hardware transaction memory operations.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/htm.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef HTM_H
+#define HTM_H
+
+#ifdef FRONTEND
+#error "htm.h may not be included from frontend code"
+#endif
+
+#ifndef HAVE_HTM
+
+/*
+ * Make this a constant, so that if you build without --enable-htm, all
+ * relevant branches are removed by constant folding.
+ */
+#define have_htm_support false
+
+#else
+
+/*
+ * We have to check if the microarchitecture supports HTM instructions
+ * with a runtime check. We'll store the result in a global variable.
+ */
+extern bool have_htm_support;
+
+/* Function called at startup to set the above variable. */
+extern void htm_init(void);
+
+/*
+ * A future version of the C programming language might standardize
+ * the interface to transactional memory (see eg N1961), but for now we
+ * must use compiler builtins that vary.
+ */
+#if (defined(__GNUC__) || defined(__INTEL_COMPILER) || defined(_MSC_VER)) && !(defined(__IBMC__) || defined(__IBMCPP__))
+/* ICC, GCC, MSVC use Intel's _xbegin() interfaces for x86 instructions. */
+/* TODO: Only tested on GCC; is the header the same on the others? Is there a minimum version for each compiler? */
+#include <immintrin.h>
+#define pg_htm_begin() (_xbegin() == _XBEGIN_STARTED)
+#define pg_htm_commit() _xend()
+#define pg_htm_abort() _xabort(0)
+#elif (defined(__IBMC__) || defined(__IBMCPP__))
+/* IBM XLC uses __TM_begin() etc for POWER instructions. */
+#error "IBM compiler support for HTM not yet implemented"
+#else
+#error "no hardware transactional memory support"
+#endif
+
+#endif /* HAVE_HTM */
+
+#endif /* HTM_H */
--
2.20.1
0002-Use-hardware-transactional-memory-for-SSI.patchtext/x-patch; charset=US-ASCII; name=0002-Use-hardware-transactional-memory-for-SSI.patchDownload
From ae81c614605f5139ab4100f888d483b484b4de8d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 20 Feb 2020 11:40:38 +1300
Subject: [PATCH 2/2] Use hardware transactional memory for SSI.
Replace several of the workhorse hot functions in predicate.c
with versions that first optimistically try to run in a hardware
memory transaction, but fall back to the LWLock-based versions
if the memory transaction fails.
If the optimism pays off, then we skip a whole bunch of LWLock
churn. If it doesn't (that is, if you're frequently hitting the
same predicate lock table buckets from concurrent backends),
then it's probably slower due to frequent need to fall back
to the LWLock path, after already wasting energy on a failed
HTM transaction.
WORK IN PROGRESS -- highly experimental
Author: Thomas Munro
---
src/backend/storage/lmgr/predicate.c | 387 +++++++++++++++++++--------
1 file changed, 275 insertions(+), 112 deletions(-)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 654584b77a..f6c4f72bb3 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -202,6 +202,7 @@
#include "access/xlog.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "port/htm.h"
#include "storage/bufmgr.h"
#include "storage/predicate.h"
#include "storage/predicate_internals.h"
@@ -313,6 +314,61 @@
<< LOG2_NUM_PREDICATELOCK_PARTITIONS)
+#ifdef HAVE_HTM
+
+/*
+ * If this microarchitecture doesn't support HTM, or if our HTM transaction
+ * aborts for any reason, we'll return false. This macro expects to be used in
+ * a function that has a use_memory_transaction parameter and returns false if
+ * an HTM transaction abort.
+ */
+#define MaybeBeginMemoryTransaction() \
+ if (use_memory_transaction && (!have_htm_support || !pg_htm_begin())) \
+ return false
+#define MaybeCommitMemoryTransaction() \
+ if (use_memory_transaction) \
+ pg_htm_commit()
+/*
+ * When in a hardware memory transaction, we don't acquire any locks at all.
+ * We still need to read them though, so that we can abort explicitly if we
+ * see that some other process has the lock and must therefore be in the
+ * fallback path, and so that the memory hardware aborts our transaction if
+ * that changes underneath us.
+ *
+ * XXX: lwlock.h should probably give us a tidy interface for the read, rather
+ * than accessing its internal state directly like this!
+ *
+ * XXX: It's probably not necessary to abort explicitly if only a share lock is
+ * held? Not sure if it's worth worrying about since we'll get (spurious)
+ * conflicts from the shared counter changing, but for now abort only on
+ * 1 << 24 which is the sentinel value LW_VAL_EXCLUSIVE.
+ */
+#define MaybeLWLockAcquire(lock, mode) \
+ do { \
+ if (!use_memory_transaction) \
+ LWLockAcquire((lock), (mode)); \
+ else if (pg_atomic_read_u32(&(lock)->state) == (1 << 24)) \
+ pg_htm_abort(); \
+ } while (0)
+#define MaybeLWLockRelease(lock) \
+ do { \
+ if (!use_memory_transaction) \
+ LWLockRelease((lock)); \
+ } while (0)
+
+#else
+
+/*
+ * If HTM is isn't supported in this build, just return false. This should
+ * remove the HTM path entirely from the generated code, given enough inlining.
+ */
+#define MaybeBeginMemoryTransaction() if (use_memory_transaction) return false
+#define MaybeCommitMemoryTransaction()
+#define MaybeLWLockAcquire(lock, mode) LWLockAcquire((lock), (mode))
+#define MaybeLWLockRelease(lock) LWLockRelease((lock))
+
+#endif
+
/*
* The SLRU buffer area through which we access the old xids.
*/
@@ -473,14 +529,17 @@ static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag);
static void DropAllPredicateLocksFromTable(Relation relation,
bool transfer);
static void SetNewSxactGlobalXmin(void);
-static void ClearOldPredicateLocks(void);
+static pg_attribute_always_inline void ClearOldPredicateLocks(bool use_memory_transaction);
static void ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
- bool summarize);
+ bool summarize,
+ bool use_memory_transaction);
static bool XidIsConcurrent(TransactionId xid);
static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
-static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
+static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer,
+ bool use_memory_transaction);
static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
- SERIALIZABLEXACT *writer);
+ SERIALIZABLEXACT *writer,
+ bool use_memory_transaction);
static void CreateLocalPredicateLockHash(void);
static void ReleasePredicateLocksLocal(void);
@@ -1466,7 +1525,7 @@ SummarizeOldestCommittedSxact(void)
? sxact->SeqNo.earliestOutConflictCommit : InvalidSerCommitSeqNo);
/* Summarize and release the detail. */
- ReleaseOneSerializableXact(sxact, false, true);
+ ReleaseOneSerializableXact(sxact, false, true, false);
LWLockRelease(SerializableFinishedListLock);
}
@@ -2372,10 +2431,11 @@ DecrementParentLocks(const PREDICATELOCKTARGETTAG *targettag)
* granularity promotion or the local lock table. See
* PredicateLockAcquire for that.
*/
-static void
-CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
- uint32 targettaghash,
- SERIALIZABLEXACT *sxact)
+static pg_attribute_always_inline bool
+CreatePredicateLockImpl(const PREDICATELOCKTARGETTAG *targettag,
+ uint32 targettaghash,
+ SERIALIZABLEXACT *sxact,
+ bool use_memory_transaction)
{
PREDICATELOCKTARGET *target;
PREDICATELOCKTAG locktag;
@@ -2385,10 +2445,12 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
partitionLock = PredicateLockHashPartitionLock(targettaghash);
- LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+ MaybeBeginMemoryTransaction();
+
+ MaybeLWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
if (IsInParallelMode())
- LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
- LWLockAcquire(partitionLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(partitionLock, LW_EXCLUSIVE);
/* Make sure that the target is represented. */
target = (PREDICATELOCKTARGET *)
@@ -2396,10 +2458,13 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
targettag, targettaghash,
HASH_ENTER_NULL, &found);
if (!target)
+ {
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory"),
errhint("You might need to increase max_pred_locks_per_transaction.")));
+ }
if (!found)
SHMQueueInit(&(target->predicateLocks));
@@ -2411,10 +2476,13 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
PredicateLockHashCodeFromTargetHashCode(&locktag, targettaghash),
HASH_ENTER_NULL, &found);
if (!lock)
+ {
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
errmsg("out of shared memory"),
errhint("You might need to increase max_pred_locks_per_transaction.")));
+ }
if (!found)
{
@@ -2424,10 +2492,23 @@ CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
lock->commitSeqNo = InvalidSerCommitSeqNo;
}
- LWLockRelease(partitionLock);
+ MaybeLWLockRelease(partitionLock);
if (IsInParallelMode())
- LWLockRelease(&sxact->predicateLockListLock);
- LWLockRelease(SerializablePredicateLockListLock);
+ MaybeLWLockRelease(&sxact->predicateLockListLock);
+ MaybeLWLockRelease(SerializablePredicateLockListLock);
+
+ MaybeCommitMemoryTransaction();
+
+ return true;
+}
+
+static void
+CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
+ uint32 targettaghash,
+ SERIALIZABLEXACT *sxact)
+{
+ if (!CreatePredicateLockImpl(targettag, targettaghash, sxact, true))
+ CreatePredicateLockImpl(targettag, targettaghash, sxact, false);
}
/*
@@ -3259,14 +3340,16 @@ SetNewSxactGlobalXmin(void)
* MySerializableXact variable and benefit from the optimization in its own
* time.
*/
-void
-ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
+static pg_attribute_always_inline bool
+ReleasePredicateLocksImpl(bool isCommit, bool isReadOnlySafe,
+ bool use_memory_transaction)
{
bool needToClear;
RWConflict conflict,
nextConflict,
possibleUnsafeConflict;
SERIALIZABLEXACT *roXact;
+ TransactionId approximateNextXid;
/*
* We can't trust XactReadOnly here, because a transaction which started
@@ -3293,7 +3376,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
if (IsParallelWorker())
{
ReleasePredicateLocksLocal();
- return;
+ return true;
}
/*
@@ -3320,10 +3403,17 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
if (MySerializableXact == InvalidSerializableXact)
{
Assert(LocalPredicateLockHash == NULL);
- return;
+ return true;
}
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ /*
+ * We might need an unsynchronized read of the next xid, but we don't want
+ * that read to be considered part of our memory transaction. So read it first.
+ */
+ approximateNextXid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+
+ MaybeBeginMemoryTransaction();
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
/*
* If the transaction is committing, but it has been partially released
@@ -3355,9 +3445,10 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
*/
if (SxactIsPartiallyReleased(MySerializableXact))
{
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
ReleasePredicateLocksLocal();
- return;
+ return true;
}
else
{
@@ -3390,7 +3481,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
* transaction to complete before freeing some RAM; correctness of visible
* behavior is not affected.
*/
- MySerializableXact->finishedBefore = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+ MySerializableXact->finishedBefore = approximateNextXid;
/*
* If it's not a commit it's either a rollback or a read-only transaction
@@ -3624,9 +3715,9 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
}
}
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
- LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
/* Add this to the list of transactions to check for later cleanup. */
if (isCommit)
@@ -3642,14 +3733,26 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
if (!isCommit)
ReleaseOneSerializableXact(MySerializableXact,
isReadOnlySafe && IsInParallelMode(),
- false);
+ false,
+ use_memory_transaction);
- LWLockRelease(SerializableFinishedListLock);
+ MaybeLWLockRelease(SerializableFinishedListLock);
if (needToClear)
- ClearOldPredicateLocks();
+ ClearOldPredicateLocks(use_memory_transaction);
+
+ MaybeCommitMemoryTransaction();
ReleasePredicateLocksLocal();
+
+ return true;
+}
+
+void
+ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
+{
+ if (!ReleasePredicateLocksImpl(isCommit, isReadOnlySafe, true))
+ ReleasePredicateLocksImpl(isCommit, isReadOnlySafe, false);
}
static void
@@ -3670,8 +3773,8 @@ ReleasePredicateLocksLocal(void)
* Clear old predicate locks, belonging to committed transactions that are no
* longer interesting to any in-progress transaction.
*/
-static void
-ClearOldPredicateLocks(void)
+static pg_attribute_always_inline void
+ClearOldPredicateLocks(bool use_memory_transaction)
{
SERIALIZABLEXACT *finishedSxact;
PREDICATELOCK *predlock;
@@ -3680,12 +3783,12 @@ ClearOldPredicateLocks(void)
* Loop through finished transactions. They are in commit order, so we can
* stop as soon as we find one that's still interesting.
*/
- LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
finishedSxact = (SERIALIZABLEXACT *)
SHMQueueNext(FinishedSerializableTransactions,
FinishedSerializableTransactions,
offsetof(SERIALIZABLEXACT, finishedLink));
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
while (finishedSxact)
{
SERIALIZABLEXACT *nextSxact;
@@ -3702,10 +3805,11 @@ ClearOldPredicateLocks(void)
* This transaction committed before any in-progress transaction
* took its snapshot. It's no longer interesting.
*/
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
SHMQueueDelete(&(finishedSxact->finishedLink));
- ReleaseOneSerializableXact(finishedSxact, false, false);
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ ReleaseOneSerializableXact(finishedSxact, false, false,
+ use_memory_transaction);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
}
else if (finishedSxact->commitSeqNo > PredXact->HavePartialClearedThrough
&& finishedSxact->commitSeqNo <= PredXact->CanPartialClearThrough)
@@ -3715,13 +3819,14 @@ ClearOldPredicateLocks(void)
* transaction committed are read-only, so we can clear part of
* its state.
*/
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
if (SxactIsReadOnly(finishedSxact))
{
/* A read-only transaction can be removed entirely */
SHMQueueDelete(&(finishedSxact->finishedLink));
- ReleaseOneSerializableXact(finishedSxact, false, false);
+ ReleaseOneSerializableXact(finishedSxact, false, false,
+ use_memory_transaction);
}
else
{
@@ -3730,11 +3835,12 @@ ClearOldPredicateLocks(void)
* need to keep the SERIALIZABLEXACT but can release the
* SIREAD locks and conflicts in.
*/
- ReleaseOneSerializableXact(finishedSxact, true, false);
+ ReleaseOneSerializableXact(finishedSxact, true, false,
+ use_memory_transaction);
}
PredXact->HavePartialClearedThrough = finishedSxact->commitSeqNo;
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
}
else
{
@@ -3743,12 +3849,12 @@ ClearOldPredicateLocks(void)
}
finishedSxact = nextSxact;
}
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
/*
* Loop through predicate locks on dummy transaction for summarized data.
*/
- LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
predlock = (PREDICATELOCK *)
SHMQueueNext(&OldCommittedSxact->predicateLocks,
&OldCommittedSxact->predicateLocks,
@@ -3763,11 +3869,11 @@ ClearOldPredicateLocks(void)
&predlock->xactLink,
offsetof(PREDICATELOCK, xactLink));
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
Assert(predlock->commitSeqNo != 0);
Assert(predlock->commitSeqNo != InvalidSerCommitSeqNo);
canDoPartialCleanup = (predlock->commitSeqNo <= PredXact->CanPartialClearThrough);
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
/*
* If this lock originally belonged to an old enough transaction, we
@@ -3787,7 +3893,7 @@ ClearOldPredicateLocks(void)
targettaghash = PredicateLockTargetTagHashCode(&targettag);
partitionLock = PredicateLockHashPartitionLock(targettaghash);
- LWLockAcquire(partitionLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(partitionLock, LW_EXCLUSIVE);
SHMQueueDelete(&(predlock->targetLink));
SHMQueueDelete(&(predlock->xactLink));
@@ -3798,14 +3904,14 @@ ClearOldPredicateLocks(void)
HASH_REMOVE, NULL);
RemoveTargetIfNoLongerUsed(target, targettaghash);
- LWLockRelease(partitionLock);
+ MaybeLWLockRelease(partitionLock);
}
predlock = nextpredlock;
}
- LWLockRelease(SerializablePredicateLockListLock);
- LWLockRelease(SerializableFinishedListLock);
+ MaybeLWLockRelease(SerializablePredicateLockListLock);
+ MaybeLWLockRelease(SerializableFinishedListLock);
}
/*
@@ -3829,7 +3935,8 @@ ClearOldPredicateLocks(void)
*/
static void
ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
- bool summarize)
+ bool summarize,
+ bool use_memory_transaction)
{
PREDICATELOCK *predlock;
SERIALIZABLEXIDTAG sxidtag;
@@ -3839,15 +3946,15 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
Assert(sxact != NULL);
Assert(SxactIsRolledBack(sxact) || SxactIsCommitted(sxact));
Assert(partial || !SxactIsOnFinishedList(sxact));
- Assert(LWLockHeldByMe(SerializableFinishedListLock));
+ Assert(LWLockHeldByMe(SerializableFinishedListLock) ^ use_memory_transaction);
/*
* First release all the predicate locks held by this xact (or transfer
* them to OldCommittedSxact if summarize is true)
*/
- LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
if (IsInParallelMode())
- LWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(&sxact->predicateLockListLock, LW_EXCLUSIVE);
predlock = (PREDICATELOCK *)
SHMQueueNext(&(sxact->predicateLocks),
&(sxact->predicateLocks),
@@ -3874,7 +3981,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
targettaghash = PredicateLockTargetTagHashCode(&targettag);
partitionLock = PredicateLockHashPartitionLock(targettaghash);
- LWLockAcquire(partitionLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(partitionLock, LW_EXCLUSIVE);
SHMQueueDelete(targetLink);
@@ -3916,7 +4023,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
else
RemoveTargetIfNoLongerUsed(target, targettaghash);
- LWLockRelease(partitionLock);
+ MaybeLWLockRelease(partitionLock);
predlock = nextpredlock;
}
@@ -3928,11 +4035,11 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
SHMQueueInit(&sxact->predicateLocks);
if (IsInParallelMode())
- LWLockRelease(&sxact->predicateLockListLock);
- LWLockRelease(SerializablePredicateLockListLock);
+ MaybeLWLockRelease(&sxact->predicateLockListLock);
+ MaybeLWLockRelease(SerializablePredicateLockListLock);
sxidtag.xid = sxact->topXid;
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
/* Release all outConflicts (unless 'partial' is true) */
if (!partial)
@@ -3979,7 +4086,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
ReleasePredXact(sxact);
}
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
}
/*
@@ -4045,15 +4152,17 @@ CheckForSerializableConflictOutNeeded(Relation relation, Snapshot snapshot)
* transactions overlap (i.e., they cannot see each other's writes), then we
* have a conflict out.
*/
-void
-CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot snapshot)
+static pg_attribute_always_inline bool
+CheckForSerializableConflictOutImpl(Relation relation, TransactionId xid,
+ Snapshot snapshot,
+ bool use_memory_transaction)
{
SERIALIZABLEXIDTAG sxidtag;
SERIALIZABLEXID *sxid;
SERIALIZABLEXACT *sxact;
if (!SerializationNeededForRead(relation, snapshot))
- return;
+ return true;
/* Check if someone else has already decided that we need to die */
if (SxactIsDoomed(MySerializableXact))
@@ -4067,13 +4176,15 @@ CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot s
Assert(TransactionIdIsValid(xid));
if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
- return;
+ return true;
+
+ MaybeBeginMemoryTransaction();
/*
* Find sxact or summarized info for the top level xid.
*/
sxidtag.xid = xid;
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
sxid = (SERIALIZABLEXID *)
hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
if (!sxid)
@@ -4091,34 +4202,42 @@ CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot s
&& (!SxactIsReadOnly(MySerializableXact)
|| conflictCommitSeqNo
<= MySerializableXact->SeqNo.lastCommitBeforeSnapshot))
+ {
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
errdetail_internal("Reason code: Canceled on conflict out to old pivot %u.", xid),
errhint("The transaction might succeed if retried.")));
+ }
if (SxactHasSummaryConflictIn(MySerializableXact)
|| !SHMQueueEmpty(&MySerializableXact->inConflicts))
+ {
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
errdetail_internal("Reason code: Canceled on identification as a pivot, with conflict out to old committed transaction %u.", xid),
errhint("The transaction might succeed if retried.")));
+ }
MySerializableXact->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
}
/* It's not serializable or otherwise not important. */
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
+ return true;
}
sxact = sxid->myXact;
Assert(TransactionIdEquals(sxact->topXid, xid));
if (sxact == MySerializableXact || SxactIsDoomed(sxact))
{
/* Can't conflict with ourself or a transaction that will roll back. */
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
+ return true;
}
/*
@@ -4132,12 +4251,14 @@ CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot s
if (!SxactIsPrepared(sxact))
{
sxact->flags |= SXACT_FLAG_DOOMED;
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
+ return true;
}
else
{
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
@@ -4158,38 +4279,49 @@ CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot s
|| MySerializableXact->SeqNo.lastCommitBeforeSnapshot < sxact->SeqNo.earliestOutConflictCommit))
{
/* Read-only transaction will appear to run first. No conflict. */
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ return true;
}
if (!XidIsConcurrent(xid))
{
/* This write was already in our snapshot; no conflict. */
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ return true;
}
if (RWConflictExists(MySerializableXact, sxact))
{
/* We don't want duplicate conflict records in the list. */
- LWLockRelease(SerializableXactHashLock);
- return;
+ MaybeLWLockRelease(SerializableXactHashLock);
+ return true;
}
/*
* Flag the conflict. But first, if this conflict creates a dangerous
* structure, ereport an error.
*/
- FlagRWConflict(MySerializableXact, sxact);
- LWLockRelease(SerializableXactHashLock);
+ FlagRWConflict(MySerializableXact, sxact, use_memory_transaction);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
+
+ return true;
+}
+
+void
+CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot snapshot)
+{
+ if (!CheckForSerializableConflictOutImpl(relation, xid, snapshot, true))
+ CheckForSerializableConflictOutImpl(relation, xid, snapshot, false);
}
/*
* Check a particular target for rw-dependency conflict in. A subroutine of
* CheckForSerializableConflictIn().
*/
-static void
-CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
+static pg_attribute_always_inline bool
+CheckTargetForConflictsInImpl(PREDICATELOCKTARGETTAG *targettag,
+ bool use_memory_transaction)
{
uint32 targettaghash;
LWLock *partitionLock;
@@ -4198,6 +4330,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
PREDICATELOCK *mypredlock = NULL;
PREDICATELOCKTAG mypredlocktag;
+ MaybeBeginMemoryTransaction();
+
Assert(MySerializableXact != InvalidSerializableXact);
/*
@@ -4205,7 +4339,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
*/
targettaghash = PredicateLockTargetTagHashCode(targettag);
partitionLock = PredicateLockHashPartitionLock(targettaghash);
- LWLockAcquire(partitionLock, LW_SHARED);
+ MaybeLWLockAcquire(partitionLock, LW_SHARED);
target = (PREDICATELOCKTARGET *)
hash_search_with_hash_value(PredicateLockTargetHash,
targettag, targettaghash,
@@ -4213,8 +4347,9 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
if (!target)
{
/* Nothing has this target locked; we're done here. */
- LWLockRelease(partitionLock);
- return;
+ MaybeLWLockRelease(partitionLock);
+ MaybeCommitMemoryTransaction();
+ return true;
}
/*
@@ -4225,7 +4360,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
SHMQueueNext(&(target->predicateLocks),
&(target->predicateLocks),
offsetof(PREDICATELOCK, targetLink));
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
while (predlock)
{
SHM_QUEUE *predlocktargetlink;
@@ -4264,8 +4399,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
sxact->finishedBefore))
&& !RWConflictExists(sxact, MySerializableXact))
{
- LWLockRelease(SerializableXactHashLock);
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
/*
* Re-check after getting exclusive lock because the other
@@ -4277,17 +4412,17 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
sxact->finishedBefore))
&& !RWConflictExists(sxact, MySerializableXact))
{
- FlagRWConflict(sxact, MySerializableXact);
+ FlagRWConflict(sxact, MySerializableXact, use_memory_transaction);
}
- LWLockRelease(SerializableXactHashLock);
- LWLockAcquire(SerializableXactHashLock, LW_SHARED);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_SHARED);
}
predlock = nextpredlock;
}
- LWLockRelease(SerializableXactHashLock);
- LWLockRelease(partitionLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(partitionLock);
/*
* If we found one of our own SIREAD locks to remove, remove it now.
@@ -4302,11 +4437,11 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
uint32 predlockhashcode;
PREDICATELOCK *rmpredlock;
- LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
+ MaybeLWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
if (IsInParallelMode())
- LWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
- LWLockAcquire(partitionLock, LW_EXCLUSIVE);
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(&MySerializableXact->predicateLockListLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(partitionLock, LW_EXCLUSIVE);
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
/*
* Remove the predicate lock from shared memory, if it wasn't removed
@@ -4337,11 +4472,11 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
RemoveTargetIfNoLongerUsed(target, targettaghash);
}
- LWLockRelease(SerializableXactHashLock);
- LWLockRelease(partitionLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(partitionLock);
if (IsInParallelMode())
- LWLockRelease(&MySerializableXact->predicateLockListLock);
- LWLockRelease(SerializablePredicateLockListLock);
+ MaybeLWLockRelease(&MySerializableXact->predicateLockListLock);
+ MaybeLWLockRelease(SerializablePredicateLockListLock);
if (rmpredlock != NULL)
{
@@ -4357,6 +4492,15 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
DecrementParentLocks(targettag);
}
}
+ MaybeCommitMemoryTransaction();
+ return true;
+}
+
+static void
+CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
+{
+ if (!CheckTargetForConflictsInImpl(targettag, true))
+ CheckTargetForConflictsInImpl(targettag, false);
}
/*
@@ -4524,7 +4668,7 @@ CheckTableForSerializableConflictIn(Relation relation)
if (predlock->tag.myXact != MySerializableXact
&& !RWConflictExists(predlock->tag.myXact, MySerializableXact))
{
- FlagRWConflict(predlock->tag.myXact, MySerializableXact);
+ FlagRWConflict(predlock->tag.myXact, MySerializableXact, false);
}
predlock = nextpredlock;
@@ -4546,12 +4690,14 @@ CheckTableForSerializableConflictIn(Relation relation)
* the transaction hash table.
*/
static void
-FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
+FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer,
+ bool use_memory_transaction)
{
Assert(reader != writer);
/* First, see if this conflict causes failure. */
- OnConflict_CheckForSerializationFailure(reader, writer);
+ OnConflict_CheckForSerializationFailure(reader, writer,
+ use_memory_transaction);
/* Actually do the conflict flagging. */
if (reader == OldCommittedSxact)
@@ -4582,12 +4728,13 @@ FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
*/
static void
OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
- SERIALIZABLEXACT *writer)
+ SERIALIZABLEXACT *writer,
+ bool use_memory_transaction)
{
bool failure;
RWConflict conflict;
- Assert(LWLockHeldByMe(SerializableXactHashLock));
+ Assert(use_memory_transaction ^ LWLockHeldByMe(SerializableXactHashLock));
failure = false;
@@ -4716,7 +4863,8 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
*/
if (MySerializableXact == writer)
{
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
@@ -4725,7 +4873,8 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
}
else if (SxactIsPrepared(writer))
{
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
/* if we're not the writer, we have to be the reader */
Assert(MySerializableXact == reader);
@@ -4755,23 +4904,26 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
* committing writes, so letting it commit ensures progress. If we
* canceled the far conflict, it might immediately fail again on retry.
*/
-void
-PreCommit_CheckForSerializationFailure(void)
+static pg_attribute_always_inline bool
+PreCommit_CheckForSerializationFailureImpl(bool use_memory_transaction)
{
RWConflict nearConflict;
if (MySerializableXact == InvalidSerializableXact)
- return;
+ return true;
Assert(IsolationIsSerializable());
- LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
+ MaybeBeginMemoryTransaction();
+
+ MaybeLWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
/* Check if someone else has already decided that we need to die */
if (SxactIsDoomed(MySerializableXact))
{
Assert(!SxactIsPartiallyReleased(MySerializableXact));
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
@@ -4809,7 +4961,8 @@ PreCommit_CheckForSerializationFailure(void)
*/
if (SxactIsPrepared(nearConflict->sxactOut))
{
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to read/write dependencies among transactions"),
@@ -4835,7 +4988,17 @@ PreCommit_CheckForSerializationFailure(void)
MySerializableXact->prepareSeqNo = ++(PredXact->LastSxactCommitSeqNo);
MySerializableXact->flags |= SXACT_FLAG_PREPARED;
- LWLockRelease(SerializableXactHashLock);
+ MaybeLWLockRelease(SerializableXactHashLock);
+ MaybeCommitMemoryTransaction();
+
+ return true;
+}
+
+void
+PreCommit_CheckForSerializationFailure(void)
+{
+ if (!PreCommit_CheckForSerializationFailureImpl(true))
+ PreCommit_CheckForSerializationFailureImpl(false);
}
/*------------------------------------------------------------------------*/
--
2.20.1
On Thu, Feb 20, 2020 at 04:55:12PM +1300, Thomas Munro wrote:
Hello hackers,Here's a *highly* experimental patch set that tries to skip the LWLock
protocol in predicate.c and use HTM[1] instead. HTM is itself a sort
of hardware-level implementation of SSI for shared memory. My
thinking was that if your workload already suits the optimistic nature
of SSI, perhaps it could make sense to go all-in and remove the rather
complicated pessimistic locking it's built on top of. It falls back
to an LWLock-based path at compile time if you don't build with
--enable-htm, or at runtime if a startup test discovered that your CPU
doesn't have the Intel TSX instruction set (microarchitectures older
than Skylake, and some mobile and low power variants of current ones),
or if a hardware transaction is aborted for various reasons.
Thanks, that sounds cool!
The good news is that it seems to produce correct results in simple
tests (well, some lock-held-by-me assertions can fail in an
--enable-cassert build, that's trivial to fix). The bad news is that
it doesn't perform very well yet, and I think the reason for that is
that there are some inherently serial parts of the current design that
cause frequent conflicts.
Can you share some numbers about how not well it perform and how many
hardware transactions were aborted with a fallback? I'm curious because
from this paper [1]https://db.in.tum.de/~leis/papers/HTM.pdf I've got an impression that the bigger (in terms of
memory) and longer transaction is, the higher changes for it to get
aborted. So I wonder if it needs to be taken into account, or using it
for SSI as presented in the patch somehow implicitely minimize those
chances? Otherwise not only conflicting transactions will cause
fallback, but also those that e.g. span too much memory.
Another interesting for me question is how much is it affected by TAA
vulnerability [2]https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html, and what are the prospects of this approach in the
view of many suggests to disable TSX due to that (there are mitigations
ofcourse, but if I understand correctly e.g. for Linux it's similar to
MDS, where a mitigation is done via flushing cpu buffers on entering the
kernel space, but in between speculative access still could be
performed).
[1]: https://db.in.tum.de/~leis/papers/HTM.pdf
[2]: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html
On Thu, Feb 20, 2020 at 11:38 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
Can you share some numbers about how not well it perform and how many
hardware transactions were aborted with a fallback? I'm curious because
from this paper [1] I've got an impression that the bigger (in terms of
memory) and longer transaction is, the higher changes for it to get
aborted. So I wonder if it needs to be taken into account, or using it
for SSI as presented in the patch somehow implicitely minimize those
chances? Otherwise not only conflicting transactions will cause
fallback, but also those that e.g. span too much memory.
Good questions, and I don't have good enough numbers to share right
now; to be clear, the stage this work is at is: "wow, I think this new
alien technology might actually be producing the right answers at
least some of the time, now maybe we could start to think about
analysing its behaviour some more", and I wanted to share early and
see if anyone else was interested in the topic too :-)
Thanks for that paper, added to my reading list. The HTM
transactions' size is not linked to the size of database transactions,
which would certainly be too large. It's just used for lower level
operations that need to be atomic and serializable, replacing a bunch
of LWLocks. I see from skimming the final paragraph of that paper
that they're also not mapping database transactions directly to HTM.
So, the amount of memory you touch depends on the current size of
various lists in SSI's internal book keeping, and I haven't done the
work to figure out at which point space runs out (_XABORT_CAPACITY) in
any test workloads etc, or to consider whether some operations that I
covered with one HTM transaction could be safely broken up into
multiple transactions to minimise transaction size, though I am aware
of at least one opportunity like that.
Another interesting for me question is how much is it affected by TAA
vulnerability [2], and what are the prospects of this approach in the
view of many suggests to disable TSX due to that (there are mitigations
ofcourse, but if I understand correctly e.g. for Linux it's similar to
MDS, where a mitigation is done via flushing cpu buffers on entering the
kernel space, but in between speculative access still could be
performed).
Yeah, the rollout of TSX has been a wild ride since the beginning. I
didn't want to comment on that aspect because I just don't know enough
about it and at this point it's frankly pretty confusing. As far as
I know from limited reading, as of late last year a few well known
OSes are offering easy ways to disable TSX due to Zombieload v2 if you
would like to, but not doing so by default. I tested with the Debian
intel-microcode package version 3.20191115.2~deb10u1 installed which I
understand to the be latest and greatest, and made no relevant
modifications, and the instructions were available. I haven't read
anywhere that TSX itself is ending. Other ISAs have comparable
technology[1]https://developer.arm.com/docs/101028/0008/transactional-memory-extension-tme-intrinsics[2]https://www.ibm.com/developerworks/aix/library/au-aix-ibm-xl-compiler-built-in-functions/index.html, and the concept has been worked on for over 20
years, so... I just don't know.
[1]: https://developer.arm.com/docs/101028/0008/transactional-memory-extension-tme-intrinsics
[2]: https://www.ibm.com/developerworks/aix/library/au-aix-ibm-xl-compiler-built-in-functions/index.html