Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()
Hi,
While working on something else, I noticed that
WaitXLogInsertionsToFinish() goes the LWLockWaitForVar() route even
for a process that's holding a WAL insertion lock. Typically, a
process holding WAL insertion lock reaches
WaitXLogInsertionsToFinish() when it's in need of WAL buffer pages for
its insertion and waits for other older in-progress WAL insertions to
finish. This fact guarantees that the process holding a WAL insertion
lock will never have its insertingAt less than 'upto'.
With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.
I'm attaching a patch herewith. Here's the cirrus-ci tests -
https://github.com/BRupireddy/postgres/tree/avoid_LWLockWaitForVar_for_currently_held_wal_ins_lock_v1.
Thoughts?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Avoid-LWLockWaitForVar-for-currently-held-WAL-ins.patchapplication/octet-stream; name=v1-0001-Avoid-LWLockWaitForVar-for-currently-held-WAL-ins.patchDownload
From 82296a73d48451b3c16266cb5df529ee60ffe04f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 24 Nov 2022 12:08:38 +0000
Subject: [PATCH v1] Avoid LWLockWaitForVar() for currently held WAL insertion
lock in WaitXLogInsertionsToFinish()
---
src/backend/access/transam/xlog.c | 40 ++++++++++++++++++++++++++++++-
1 file changed, 39 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..cffcbda941 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -631,7 +631,7 @@ static TimeLineID LocalMinRecoveryPointTLI;
static bool updateMinRecoveryPoint = true;
/* For WALInsertLockAcquire/Release functions */
-static int MyLockNo = 0;
+static int MyLockNo = -1;
static bool holdingAllLocks = false;
#ifdef WAL_DEBUG
@@ -1321,6 +1321,8 @@ WALInsertLockAcquire(void)
if (lockToTry == -1)
lockToTry = MyProc->pgprocno % NUM_XLOGINSERT_LOCKS;
+
+ Assert(MyLockNo == -1);
MyLockNo = lockToTry;
/*
@@ -1351,6 +1353,8 @@ WALInsertLockAcquireExclusive(void)
{
int i;
+ Assert(MyLockNo == -1);
+
/*
* When holding all the locks, all but the last lock's insertingAt
* indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
@@ -1382,6 +1386,8 @@ WALInsertLockRelease(void)
{
int i;
+ Assert(MyLockNo == -1);
+
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
&WALInsertLocks[i].l.insertingAt,
@@ -1394,6 +1400,8 @@ WALInsertLockRelease(void)
LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
&WALInsertLocks[MyLockNo].l.insertingAt,
0);
+
+ MyLockNo = -1;
}
}
@@ -1406,6 +1414,8 @@ WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
{
if (holdingAllLocks)
{
+ Assert(MyLockNo == -1);
+
/*
* We use the last lock to mark our actual position, see comments in
* WALInsertLockAcquireExclusive.
@@ -1482,6 +1492,34 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;
+ /*
+ * If this process is holding the current WAL insert lock, then it can
+ * never happen that its insertion point is behind 'upto'. When the
+ * process doesn't get enough WAL buffer pages for its insertion, it
+ * comes here requesting the older in progress insertions to finish.
+ *
+ * In this case, we avoid going the LWLockWaitForVar route and quickly
+ * continue.
+ */
+ if (MyLockNo == i)
+ {
+ Assert(LWLockHeldByMe(&WALInsertLocks[i].l.lock));
+
+ /*
+ * Since we're holding the WAL insert lock, we can safely read
+ * insertingAt here.
+ */
+ insertingat = WALInsertLocks[i].l.insertingAt;
+
+ Assert(!XLogRecPtrIsInvalid(insertingat));
+ Assert(insertingat >= upto);
+
+ if (insertingat < finishedUpto)
+ finishedUpto = insertingat;
+
+ continue;
+ }
+
do
{
/*
--
2.34.1
Hi,
On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote:
With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.
That doesn't seem like a big win. We're still going to call LWLockWaitForVar()
for all the other locks.
I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.
The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.
I'd bet that start to have visible effects in a workload with many small
records.
Greetings,
Andres Freund
On Fri, Nov 25, 2022 at 12:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote:
With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.That doesn't seem like a big win. We're still going to call LWLockWaitForVar()
for all the other locks.I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.I'd bet that start to have visible effects in a workload with many small
records.
Thanks Andres! I quickly came up with the attached patch. I also ran
an insert test [1]./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & cd inst/bin ./pg_ctl -D data -l logfile stop rm -rf data logfile free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";' ./pg_ctl -D data -l logfile restart ./pgbench -i -s 1 -d postgres ./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" cat << EOF >> insert.sql \set aid random(1, 10 * :scale) \set delta random(1, 100000 * :scale) INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta); EOF ulimit -S -n 5000 for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done, below are the results. I also attached the results
graph. The cirrus-ci is happy with the patch -
https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2.
Please let me know if the direction of the patch seems right.
clients HEAD PATCHED
1 1354 1499
2 1451 1464
4 3069 3073
8 5712 5797
16 11331 11157
32 22020 22074
64 41742 42213
128 71300 76638
256 103652 118944
512 111250 161582
768 99544 161987
1024 96743 164161
2048 72711 156686
4096 54158 135713
[1]: ./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & cd inst/bin ./pg_ctl -D data -l logfile stop rm -rf data logfile free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";' ./pg_ctl -D data -l logfile restart ./pgbench -i -s 1 -d postgres ./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" cat << EOF >> insert.sql \set aid random(1, 10 * :scale) \set delta random(1, 100000 * :scale) INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta); EOF ulimit -S -n 5000 for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done
./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j
8 install > install.log 2>&1 &
cd inst/bin
./pg_ctl -D data -l logfile stop
rm -rf data logfile
free -m
sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
free -m
./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";'
./pg_ctl -D data -l logfile restart
./pgbench -i -s 1 -d postgres
./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;"
cat << EOF >> insert.sql
\set aid random(1, 10 * :scale)
\set delta random(1, 100000 * :scale)
INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta);
EOF
ulimit -S -n 5000
for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n
"$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c
-j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-WAL-Insertion-Lock-Improvements.patchapplication/octet-stream; name=v1-0001-WAL-Insertion-Lock-Improvements.patchDownload
From 293e789f9c1a63748147acd613c556961f1dc5c4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 25 Nov 2022 10:53:56 +0000
Subject: [PATCH v1] WAL Insertion Lock Improvements
---
src/backend/access/transam/xlog.c | 8 +++--
src/backend/storage/lmgr/lwlock.c | 56 +++++++++++++++++--------------
src/include/storage/lwlock.h | 7 ++--
3 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..b3f758abb3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -376,7 +376,7 @@ typedef struct XLogwrtResult
typedef struct
{
LWLock lock;
- XLogRecPtr insertingAt;
+ pg_atomic_uint64 insertingAt;
XLogRecPtr lastImportantAt;
} WALInsertLock;
@@ -1482,6 +1482,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;
+ /* Quickly check and continue if no one holds the lock. */
+ if (!IsLWLockHeld(&WALInsertLocks[i].l.lock))
+ continue;
+
do
{
/*
@@ -4602,7 +4606,7 @@ XLOGShmemInit(void)
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
{
LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
- WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr;
+ pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index a5ad36ca78..3c2048224b 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1536,6 +1536,15 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
return !mustwait;
}
+bool
+IsLWLockHeld(LWLock *lock)
+{
+ uint32 state;
+
+ state = pg_atomic_read_u32(&lock->state);
+ return ((state & LW_VAL_EXCLUSIVE) != 0) || ((state & LW_VAL_SHARED) != 0);
+}
+
/*
* Does the lwlock in its current state need to wait for the variable value to
* change?
@@ -1546,9 +1555,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
* *result is set to true if the lock was free, and false otherwise.
*/
static bool
-LWLockConflictsWithVar(LWLock *lock,
- uint64 *valptr, uint64 oldval, uint64 *newval,
- bool *result)
+LWLockConflictsWithVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval,
+ uint64 *newval, bool *result)
{
bool mustwait;
uint64 value;
@@ -1570,14 +1578,7 @@ LWLockConflictsWithVar(LWLock *lock,
*result = false;
- /*
- * Read value using the lwlock's wait list lock, as we can't generally
- * rely on atomic 64 bit reads/stores. TODO: On platforms with a way to
- * do atomic 64 bit reads/writes the spinlock should be optimized away.
- */
- LWLockWaitListLock(lock);
- value = *valptr;
- LWLockWaitListUnlock(lock);
+ value = pg_atomic_read_u64(valptr);
if (value != oldval)
{
@@ -1606,7 +1607,8 @@ LWLockConflictsWithVar(LWLock *lock,
* in shared mode, returns 'true'.
*/
bool
-LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
+LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval,
+ uint64 *newval)
{
PGPROC *proc = MyProc;
int extraWaits = 0;
@@ -1741,21 +1743,32 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
* The caller must be holding the lock in exclusive mode.
*/
void
-LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
+LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
{
proclist_head wakeup;
proclist_mutable_iter iter;
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
+ /*
+ * When there are no waiters, quickly update the variable and exit without
+ * taking wait list lock.
+ */
+ if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0)
+ {
+ /* Update the lock's value atomically */
+ pg_atomic_write_u64(valptr, val);
+ return;
+ }
+
proclist_init(&wakeup);
LWLockWaitListLock(lock);
Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE);
- /* Update the lock's value */
- *valptr = val;
+ /* Update the lock's value atomically */
+ pg_atomic_write_u64(valptr, val);
/*
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
@@ -1872,17 +1885,10 @@ LWLockRelease(LWLock *lock)
* LWLockReleaseClearVar - release a previously acquired lock, reset variable
*/
void
-LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val)
+LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
{
- LWLockWaitListLock(lock);
-
- /*
- * Set the variable's value before releasing the lock, that prevents race
- * a race condition wherein a new locker acquires the lock, but hasn't yet
- * set the variables value.
- */
- *valptr = val;
- LWLockWaitListUnlock(lock);
+ /* Update the lock's value atomically */
+ pg_atomic_write_u64(valptr, val);
LWLockRelease(lock);
}
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index a494cb598f..3fae217d26 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -125,14 +125,15 @@ extern bool LWLockAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockConditionalAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode);
extern void LWLockRelease(LWLock *lock);
-extern void LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val);
+extern bool IsLWLockHeld(LWLock *lock);
+extern void LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val);
extern void LWLockReleaseAll(void);
extern bool LWLockHeldByMe(LWLock *lock);
extern bool LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride);
extern bool LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode);
-extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval);
-extern void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val);
+extern bool LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval, uint64 *newval);
+extern void LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val);
extern Size LWLockShmemSize(void);
extern void CreateLWLocks(void);
--
2.34.1
WAL Insertion Lock Improvements.pngimage/png; name="WAL Insertion Lock Improvements.png"Download
�PNG
IHDR � l<�! IDATx^�� |e��������@K��B��] PJ.�^�/J+����`h�U���.Q�kn�+Q���zq�
u)�����bKJ(�tI�4���3����93����;s����69y�w���>3���kd�����!� ����l��5�l��`���� � � �K/��� � �@� �A��@ @ @@# �� � �@@�� � 0| @ @ �@ ���%� � @�� ��,@�-A @ @ �,� g:n `� � � d� 8�qK � �@ @ @ �Y��[�*�Z���:������,W����g]#��XL
�uT)��A �# ����%w�Qm��jh�:�[5���i�7r���VKZq�e�vk�\=�.�in�j!e
��l�4Sw.%/�spW-���� �� {�Q������1���i/�1���DC]�V�B�**h�N�HIZ,r�%.]% �>�]PE�#AU�I&