Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

Started by Bharath Rupireddyover 3 years ago54 messageshackers

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

Hi,

While working on something else, I noticed that
WaitXLogInsertionsToFinish() goes the LWLockWaitForVar() route even
for a process that's holding a WAL insertion lock. Typically, a
process holding WAL insertion lock reaches
WaitXLogInsertionsToFinish() when it's in need of WAL buffer pages for
its insertion and waits for other older in-progress WAL insertions to
finish. This fact guarantees that the process holding a WAL insertion
lock will never have its insertingAt less than 'upto'.

With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.

I'm attaching a patch herewith. Here's the cirrus-ci tests -
https://github.com/BRupireddy/postgres/tree/avoid_LWLockWaitForVar_for_currently_held_wal_ins_lock_v1.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Bharath Rupireddy (#1)

Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

Hi,

On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote:

With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.

That doesn't seem like a big win. We're still going to call LWLockWaitForVar()
for all the other locks.

I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.

The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.

I'd bet that start to have visible effects in a workload with many small
records.

Greetings,

Andres Freund

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Andres Freund (#2)

Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

On Fri, Nov 25, 2022 at 12:16 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote:

With that said, here's a small improvement I can think of, that is, to
avoid calling LWLockWaitForVar() for the WAL insertion lock the caller
of WaitXLogInsertionsToFinish() currently holds. Since
LWLockWaitForVar() does a bunch of things - holds interrupts, does
atomic reads, acquires and releases wait list lock and so on, avoiding
it may be a good idea IMO.

That doesn't seem like a big win. We're still going to call LWLockWaitForVar()
for all the other locks.

I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.

The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.

I'd bet that start to have visible effects in a workload with many small
records.

Thanks Andres! I quickly came up with the attached patch. I also ran
an insert test [1]./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & cd inst/bin ./pg_ctl -D data -l logfile stop rm -rf data logfile free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";' ./pg_ctl -D data -l logfile restart ./pgbench -i -s 1 -d postgres ./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" cat << EOF >> insert.sql \set aid random(1, 10 * :scale) \set delta random(1, 100000 * :scale) INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta); EOF ulimit -S -n 5000 for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done, below are the results. I also attached the results
graph. The cirrus-ci is happy with the patch -
https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2.
Please let me know if the direction of the patch seems right.
clients HEAD PATCHED
1 1354 1499
2 1451 1464
4 3069 3073
8 5712 5797
16 11331 11157
32 22020 22074
64 41742 42213
128 71300 76638
256 103652 118944
512 111250 161582
768 99544 161987
1024 96743 164161
2048 72711 156686
4096 54158 135713

[1]: ./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & cd inst/bin ./pg_ctl -D data -l logfile stop rm -rf data logfile free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";' ./pg_ctl -D data -l logfile restart ./pgbench -i -s 1 -d postgres ./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" cat << EOF >> insert.sql \set aid random(1, 10 * :scale) \set delta random(1, 100000 * :scale) INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta); EOF ulimit -S -n 5000 for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done
./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j
8 install > install.log 2>&1 &
cd inst/bin
./pg_ctl -D data -l logfile stop
rm -rf data logfile
free -m
sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
free -m
./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "16GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";'
./pg_ctl -D data -l logfile restart
./pgbench -i -s 1 -d postgres
./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;"
cat << EOF >> insert.sql
\set aid random(1, 10 * :scale)
\set delta random(1, 100000 * :scale)
INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta);
EOF
ulimit -S -n 5000
for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n
"$c ";./pgbench -n -M prepared -U ubuntu postgres -f insert.sql -c$c
-j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Bharath Rupireddy (#3)

Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

On 2022-11-25 16:54:19 +0530, Bharath Rupireddy wrote:

On Fri, Nov 25, 2022 at 12:16 AM Andres Freund <andres@anarazel.de> wrote:

I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.

The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.

I'd bet that start to have visible effects in a workload with many small
records.

Thanks Andres! I quickly came up with the attached patch. I also ran
an insert test [1], below are the results. I also attached the results
graph. The cirrus-ci is happy with the patch -
https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2.
Please let me know if the direction of the patch seems right.
clients HEAD PATCHED
1 1354 1499
2 1451 1464
4 3069 3073
8 5712 5797
16 11331 11157
32 22020 22074
64 41742 42213
128 71300 76638
256 103652 118944
512 111250 161582
768 99544 161987
1024 96743 164161
2048 72711 156686
4096 54158 135713

Nice.

From 293e789f9c1a63748147acd613c556961f1dc5c4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 25 Nov 2022 10:53:56 +0000
Subject: [PATCH v1] WAL Insertion Lock Improvements

---
src/backend/access/transam/xlog.c | 8 +++--
src/backend/storage/lmgr/lwlock.c | 56 +++++++++++++++++--------------
src/include/storage/lwlock.h | 7 ++--
3 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..b3f758abb3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -376,7 +376,7 @@ typedef struct XLogwrtResult
typedef struct
{
LWLock		lock;
-	XLogRecPtr	insertingAt;
+	pg_atomic_uint64	insertingAt;
XLogRecPtr	lastImportantAt;
} WALInsertLock;
@@ -1482,6 +1482,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;
+		/* Quickly check and continue if no one holds the lock. */
+		if (!IsLWLockHeld(&WALInsertLocks[i].l.lock))
+			continue;

I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c

I'd probably split the change to an atomic from other changes either way.

Greetings,

Andres Freund

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Andres Freund (#4)

WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

On Fri, Dec 2, 2022 at 6:10 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-11-25 16:54:19 +0530, Bharath Rupireddy wrote:

On Fri, Nov 25, 2022 at 12:16 AM Andres Freund <andres@anarazel.de> wrote:

I think we could improve this code more significantly by avoiding the call to
LWLockWaitForVar() for all locks that aren't acquired or don't have a
conflicting insertingAt, that'd require just a bit more work to handle systems
without tear-free 64bit writes/reads.

The easiest way would probably be to just make insertingAt a 64bit atomic,
that transparently does the required work to make even non-atomic read/writes
tear free. Then we could trivially avoid the spinlock in
LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more
work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait
list lock if there aren't any waiters.

I'd bet that start to have visible effects in a workload with many small
records.

Thanks Andres! I quickly came up with the attached patch. I also ran
an insert test [1], below are the results. I also attached the results
graph. The cirrus-ci is happy with the patch -
https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2.
Please let me know if the direction of the patch seems right.
clients HEAD PATCHED
1 1354 1499
2 1451 1464
4 3069 3073
8 5712 5797
16 11331 11157
32 22020 22074
64 41742 42213
128 71300 76638
256 103652 118944
512 111250 161582
768 99544 161987
1024 96743 164161
2048 72711 156686
4096 54158 135713

Nice.

Thanks for taking a look at it.

From 293e789f9c1a63748147acd613c556961f1dc5c4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Fri, 25 Nov 2022 10:53:56 +0000
Subject: [PATCH v1] WAL Insertion Lock Improvements

---
src/backend/access/transam/xlog.c | 8 +++--
src/backend/storage/lmgr/lwlock.c | 56 +++++++++++++++++--------------
src/include/storage/lwlock.h | 7 ++--
3 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..b3f758abb3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -376,7 +376,7 @@ typedef struct XLogwrtResult
typedef struct
{
LWLock          lock;
-     XLogRecPtr      insertingAt;
+     pg_atomic_uint64        insertingAt;
XLogRecPtr      lastImportantAt;
} WALInsertLock;
@@ -1482,6 +1482,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;
+             /* Quickly check and continue if no one holds the lock. */
+             if (!IsLWLockHeld(&WALInsertLocks[i].l.lock))
+                     continue;
I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c

Actually, we don't need that at all as LWLockWaitForVar() will return
immediately if the lock is free. So, I removed it.

I'd probably split the change to an atomic from other changes either way.

Done. I've added commit messages to each of the patches.

I've also brought the patch from [1]/messages/by-id/CALj2ACXtQdrGXtb=rbUOXddm1wU1vD9z6q_39FQyX0166dq==A@mail.gmail.com here as 0003.

Thoughts?

[1]: /messages/by-id/CALj2ACXtQdrGXtb=rbUOXddm1wU1vD9z6q_39FQyX0166dq==A@mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#5)

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

On Fri, Dec 02, 2022 at 04:32:38PM +0530, Bharath Rupireddy wrote:

On Fri, Dec 2, 2022 at 6:10 AM Andres Freund <andres@anarazel.de> wrote:

I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c

Actually, we don't need that at all as LWLockWaitForVar() will return
immediately if the lock is free. So, I removed it.

I briefly looked at the latest patch set, and I'm curious how this change
avoids introducing memory ordering bugs. Perhaps I am missing something
obvious.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#6)

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

Hi,

FWIW, I don't see an advantage in 0003. If it allows us to make something else
simpler / faster, cool, but on its own it doesn't seem worthwhile.

On 2022-12-02 16:31:58 -0800, Nathan Bossart wrote:

On Fri, Dec 02, 2022 at 04:32:38PM +0530, Bharath Rupireddy wrote:

On Fri, Dec 2, 2022 at 6:10 AM Andres Freund <andres@anarazel.de> wrote:

I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c

Actually, we don't need that at all as LWLockWaitForVar() will return
immediately if the lock is free. So, I removed it.

I briefly looked at the latest patch set, and I'm curious how this change
avoids introducing memory ordering bugs. Perhaps I am missing something
obvious.

I'm a bit confused too - the comment above talks about LWLockWaitForVar(), but
the patches seem to optimize LWLockUpdateVar().

I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
pre-existing, quite crufty, code. LWLockConflictsWithVar() says:

* Test first to see if it the slot is free right now.
*
* XXX: the caller uses a spinlock before this, so we don't need a memory
* barrier here as far as the current usage is concerned. But that might
* not be safe in general.

which happens to be true in the single, indirect, caller:

/* Read the current insert position */
SpinLockAcquire(&Insert->insertpos_lck);
bytepos = Insert->CurrBytePos;
SpinLockRelease(&Insert->insertpos_lck);
reservedUpto = XLogBytePosToEndRecPtr(bytepos);

I think at the very least we ought to have a comment in
WaitXLogInsertionsToFinish() highlighting this.

It's not at all clear to me that the proposed fast-path for LWLockUpdateVar()
is safe. I think at the very least we could end up missing waiters that we
should have woken up.

I think it ought to be safe to do something like

pg_atomic_exchange_u64()..
if (!(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS))
return;

because the pg_atomic_exchange_u64() will provide the necessary memory
barrier.

Greetings,

Andres Freund

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Andres Freund (#7)

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

On Tue, Dec 6, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:

FWIW, I don't see an advantage in 0003. If it allows us to make something else
simpler / faster, cool, but on its own it doesn't seem worthwhile.

Thanks. I will discard it.

I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
pre-existing, quite crufty, code. LWLockConflictsWithVar() says:

* Test first to see if it the slot is free right now.
*
* XXX: the caller uses a spinlock before this, so we don't need a memory
* barrier here as far as the current usage is concerned. But that might
* not be safe in general.

which happens to be true in the single, indirect, caller:

/* Read the current insert position */
SpinLockAcquire(&Insert->insertpos_lck);
bytepos = Insert->CurrBytePos;
SpinLockRelease(&Insert->insertpos_lck);
reservedUpto = XLogBytePosToEndRecPtr(bytepos);

I think at the very least we ought to have a comment in
WaitXLogInsertionsToFinish() highlighting this.

So, using a spinlock ensures no memory ordering occurs while reading
lock->state in LWLockConflictsWithVar()? How does spinlock that gets
acquired and released in the caller WaitXLogInsertionsToFinish()
itself and the memory barrier in the called function
LWLockConflictsWithVar() relate here? Can you please help me
understand this a bit?

It's not at all clear to me that the proposed fast-path for LWLockUpdateVar()
is safe. I think at the very least we could end up missing waiters that we
should have woken up.

I think it ought to be safe to do something like

pg_atomic_exchange_u64()..
if (!(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS))
return;

pg_atomic_exchange_u64(&lock->state, exchange_with_what_?. Exchange
will change the value no?

because the pg_atomic_exchange_u64() will provide the necessary memory
barrier.

I'm reading some comments [1]* Full barrier semantics. */ static inline uint32 pg_atomic_exchange_u32(volatile pg_atomic_uint32 *ptr,, are these also true for 64-bit atomic
CAS? Does it mean that an atomic CAS operation inherently provides a
memory barrier? Can you please point me if it's described better
somewhere else?

[1]: * Full barrier semantics. */ static inline uint32 pg_atomic_exchange_u32(volatile pg_atomic_uint32 *ptr,
* Full barrier semantics.
*/
static inline uint32
pg_atomic_exchange_u32(volatile pg_atomic_uint32 *ptr,

/*
* Get and clear the flags that are set for this backend. Note that
* pg_atomic_exchange_u32 is a full barrier, so we're guaranteed that the
* read of the barrier generation above happens before we atomically
* extract the flags, and that any subsequent state changes happen
* afterward.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Bharath Rupireddy (#8)

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

Hi,

On 2022-12-08 12:29:54 +0530, Bharath Rupireddy wrote:

On Tue, Dec 6, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:

I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
pre-existing, quite crufty, code. LWLockConflictsWithVar() says:

* Test first to see if it the slot is free right now.
*
* XXX: the caller uses a spinlock before this, so we don't need a memory
* barrier here as far as the current usage is concerned. But that might
* not be safe in general.

which happens to be true in the single, indirect, caller:

/* Read the current insert position */
SpinLockAcquire(&Insert->insertpos_lck);
bytepos = Insert->CurrBytePos;
SpinLockRelease(&Insert->insertpos_lck);
reservedUpto = XLogBytePosToEndRecPtr(bytepos);

I think at the very least we ought to have a comment in
WaitXLogInsertionsToFinish() highlighting this.

So, using a spinlock ensures no memory ordering occurs while reading
lock->state in LWLockConflictsWithVar()?

No, a spinlock *does* imply ordering. But your patch does remove several of
the spinlock acquisitions (via LWLockWaitListLock()). And moved the assignment
in LWLockUpdateVar() out from under the spinlock.

If you remove spinlock operations (or other barrier primitives), you need to
make sure that such modifications don't break the required memory ordering.

How does spinlock that gets acquired and released in the caller
WaitXLogInsertionsToFinish() itself and the memory barrier in the called
function LWLockConflictsWithVar() relate here? Can you please help me
understand this a bit?

The caller's barrier means that we'll see values that are at least as "up to
date" as at the time of the barrier (it's a bit more complicated than that, a
barrier always needs to be paired with another barrier).

It's not at all clear to me that the proposed fast-path for LWLockUpdateVar()
is safe. I think at the very least we could end up missing waiters that we
should have woken up.

I think it ought to be safe to do something like

pg_atomic_exchange_u64()..
if (!(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS))
return;

pg_atomic_exchange_u64(&lock->state, exchange_with_what_?. Exchange will
change the value no?

Not lock->state, but the atomic passed to LWLockUpdateVar(), which we do want
to update. An pg_atomic_exchange_u64() includes a memory barrier.

because the pg_atomic_exchange_u64() will provide the necessary memory
barrier.

I'm reading some comments [1], are these also true for 64-bit atomic
CAS?

Yes. See
/* ----
* The 64 bit operations have the same semantics as their 32bit counterparts
* if they are available. Check the corresponding 32bit function for
* documentation.
* ----
*/

Does it mean that an atomic CAS operation inherently provides a
memory barrier?

Yes.

Can you please point me if it's described better somewhere else?

I'm not sure what you'd like to have described more extensively, tbh.

Greetings,

Andres Freund

#10

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Andres Freund (#7)

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

On Tue, Dec 6, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi

Thanks for reviewing.

FWIW, I don't see an advantage in 0003. If it allows us to make something else
simpler / faster, cool, but on its own it doesn't seem worthwhile.

I've discarded this change.

On 2022-12-02 16:31:58 -0800, Nathan Bossart wrote:

On Fri, Dec 02, 2022 at 04:32:38PM +0530, Bharath Rupireddy wrote:

On Fri, Dec 2, 2022 at 6:10 AM Andres Freund <andres@anarazel.de> wrote:

I'm not sure this is quite right - don't we need a memory barrier. But I don't
see a reason to not just leave this code as-is. I think this should be
optimized entirely in lwlock.c

Actually, we don't need that at all as LWLockWaitForVar() will return
immediately if the lock is free. So, I removed it.

I briefly looked at the latest patch set, and I'm curious how this change
avoids introducing memory ordering bugs. Perhaps I am missing something
obvious.

I'm a bit confused too - the comment above talks about LWLockWaitForVar(), but
the patches seem to optimize LWLockUpdateVar().

I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
pre-existing, quite crufty, code. LWLockConflictsWithVar() says:

* Test first to see if it the slot is free right now.
*
* XXX: the caller uses a spinlock before this, so we don't need a memory
* barrier here as far as the current usage is concerned. But that might
* not be safe in general.

which happens to be true in the single, indirect, caller:

/* Read the current insert position */
SpinLockAcquire(&Insert->insertpos_lck);
bytepos = Insert->CurrBytePos;
SpinLockRelease(&Insert->insertpos_lck);
reservedUpto = XLogBytePosToEndRecPtr(bytepos);

I think at the very least we ought to have a comment in
WaitXLogInsertionsToFinish() highlighting this.

Done.

It's not at all clear to me that the proposed fast-path for LWLockUpdateVar()
is safe. I think at the very least we could end up missing waiters that we
should have woken up.

I think it ought to be safe to do something like

pg_atomic_exchange_u64()..
if (!(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS))
return;

because the pg_atomic_exchange_u64() will provide the necessary memory
barrier.

Done.

I'm attaching the v3 patch with the above review comments addressed.
Hopefully, no memory ordering issues now. FWIW, I've added it to CF
https://commitfest.postgresql.org/42/4141/.

Test results with the v3 patch and insert workload are the same as
that of the earlier run - TPS starts to scale at higher clients as
expected after 512 clients and peaks at 2X with 2048 and 4096 clients.

HEAD:
1 1380.411086
2 1358.378988
4 2701.974332
8 5925.380744
16 10956.501237
32 20877.513953
64 40838.046774
128 70251.744161
256 108114.321299
512 120478.988268
768 99140.425209
1024 93645.984364
2048 70111.159909
4096 55541.804826

v3 PATCHED:
1 1493.800209
2 1569.414953
4 3154.186605
8 5965.578904
16 11912.587645
32 22720.964908
64 42001.094528
128 78361.158983
256 110457.926232
512 148941.378393
768 167256.590308
1024 155510.675372
2048 147499.376882
4096 119375.457779

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#10)

Re: WAL Insertion Lock Improvements

On Tue, Jan 24, 2023 at 7:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

I'm attaching the v3 patch with the above review comments addressed.
Hopefully, no memory ordering issues now. FWIW, I've added it to CF
https://commitfest.postgresql.org/42/4141/.

Test results with the v3 patch and insert workload are the same as
that of the earlier run - TPS starts to scale at higher clients as
expected after 512 clients and peaks at 2X with 2048 and 4096 clients.

HEAD:
1 1380.411086
2 1358.378988
4 2701.974332
8 5925.380744
16 10956.501237
32 20877.513953
64 40838.046774
128 70251.744161
256 108114.321299
512 120478.988268
768 99140.425209
1024 93645.984364
2048 70111.159909
4096 55541.804826

v3 PATCHED:
1 1493.800209
2 1569.414953
4 3154.186605
8 5965.578904
16 11912.587645
32 22720.964908
64 42001.094528
128 78361.158983
256 110457.926232
512 148941.378393
768 167256.590308
1024 155510.675372
2048 147499.376882
4096 119375.457779

I slightly modified the comments and attached the v4 patch for further
review. I also took perf report - there's a clear reduction in the
functions that are affected by the patch - LWLockWaitListLock,
WaitXLogInsertionsToFinish, LWLockWaitForVar and
LWLockConflictsWithVar. Note that I compiled the source code with
-ggdb for capturing symbols for perf, still the benefit stands at > 2X
for a higher number of clients.

HEAD:
+   16.87%     0.01%  postgres  [.] CommitTransactionCommand
+   16.86%     0.00%  postgres  [.] finish_xact_command
+   16.81%     0.01%  postgres  [.] CommitTransaction
+   15.09%     0.20%  postgres  [.] LWLockWaitListLock
+   14.53%     0.01%  postgres  [.] WaitXLogInsertionsToFinish
+   14.51%     0.02%  postgres  [.] LWLockWaitForVar
+   11.70%    11.63%  postgres  [.] pg_atomic_read_u32_impl
+   11.66%     0.08%  postgres  [.] pg_atomic_read_u32
+    9.96%     0.03%  postgres  [.] LWLockConflictsWithVar
+    4.78%     0.00%  postgres  [.] LWLockQueueSelf
+    1.91%     0.01%  postgres  [.] pg_atomic_fetch_or_u32
+    1.91%     1.89%  postgres  [.] pg_atomic_fetch_or_u32_impl
+    1.73%     0.00%  postgres  [.] XLogInsert
+    1.69%     0.01%  postgres  [.] XLogInsertRecord
+    1.41%     0.01%  postgres  [.] LWLockRelease
+    1.37%     0.47%  postgres  [.] perform_spin_delay
+    1.11%     1.11%  postgres  [.] spin_delay
+    1.10%     0.03%  postgres  [.] exec_bind_message
+    0.91%     0.00%  postgres  [.] WALInsertLockRelease
+    0.91%     0.00%  postgres  [.] LWLockReleaseClearVar
+    0.72%     0.02%  postgres  [.] LWLockAcquire
+    0.60%     0.00%  postgres  [.] LWLockDequeueSelf
+    0.58%     0.00%  postgres  [.] GetTransactionSnapshot
     0.58%     0.49%  postgres  [.] GetSnapshotData
+    0.58%     0.00%  postgres  [.] WALInsertLockAcquire
+    0.55%     0.00%  postgres  [.] XactLogCommitRecord

TPS (compiled with -ggdb for capturing symbols for perf)
1 1392.512967
2 1435.899119
4 3104.091923
8 6159.305522
16 11477.641780
32 22701.000718
64 41662.425880
128 23743.426209
256 89837.651619
512 65164.221500
768 66015.733370
1024 56421.223080
2048 52909.018072
4096 40071.146985

PATCHED:
+    2.19%     0.05%  postgres  [.] LWLockWaitListLock
+    2.10%     0.01%  postgres  [.] LWLockQueueSelf
+    1.73%     1.71%  postgres  [.] pg_atomic_read_u32_impl
+    1.73%     0.02%  postgres  [.] pg_atomic_read_u32
+    1.72%     0.02%  postgres  [.] LWLockRelease
+    1.65%     0.04%  postgres  [.] exec_bind_message
+    1.43%     0.00%  postgres  [.] XLogInsert
+    1.42%     0.01%  postgres  [.] WaitXLogInsertionsToFinish
+    1.40%     0.03%  postgres  [.] LWLockWaitForVar
+    1.38%     0.02%  postgres  [.] XLogInsertRecord
+    0.93%     0.03%  postgres  [.] LWLockAcquireOrWait
+    0.91%     0.00%  postgres  [.] GetTransactionSnapshot
+    0.91%     0.79%  postgres  [.] GetSnapshotData
+    0.91%     0.00%  postgres  [.] WALInsertLockRelease
+    0.91%     0.00%  postgres  [.] LWLockReleaseClearVar
+    0.53%     0.02%  postgres  [.] ExecInitModifyTable

TPS (compiled with -ggdb for capturing symbols for perf)
1 1295.296611
2 1459.079162
4 2865.688987
8 5533.724983
16 10771.697842
32 20557.499312
64 39436.423783
128 42555.639048
256 73139.060227
512 124649.665196
768 131162.826976
1024 132185.160007
2048 117377.586644
4096 88240.336940

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#11)

Re: WAL Insertion Lock Improvements

+ pg_atomic_exchange_u64(valptr, val);

nitpick: I'd add a (void) at the beginning of these calls to
pg_atomic_exchange_u64() so that it's clear that we are discarding the
return value.

+	/*
+	 * Update the lock variable atomically first without having to acquire wait
+	 * list lock, so that if anyone looking for the lock will have chance to
+	 * grab it a bit quickly.
+	 *
+	 * NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+	 * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is
+	 * a full barrier, we're guaranteed that the subsequent atomic read of lock
+	 * state to check if it has any waiters happens after we set the lock
+	 * variable to new value here. Without a barrier, we could end up missing
+	 * waiters that otherwise should have been woken up.
+	 */
+	pg_atomic_exchange_u64(valptr, val);
+
+	/*
+	 * Quick exit when there are no waiters. This avoids unnecessary lwlock's
+	 * wait list lock acquisition and release.
+	 */
+	if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0)
+		return;

I think this makes sense. A waiter could queue itself after the exchange,
but it'll recheck after queueing. IIUC this is basically how this works
today. We update the value and release the lock before waking up any
waiters, so the same principle applies.

Overall, I think this patch is in reasonable shape.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#13

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#12)

Re: WAL Insertion Lock Improvements

On Thu, Feb 9, 2023 at 3:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

+ pg_atomic_exchange_u64(valptr, val);

nitpick: I'd add a (void) at the beginning of these calls to
pg_atomic_exchange_u64() so that it's clear that we are discarding the
return value.

I did that in the attached v5 patch although it's a mix elsewhere;
some doing explicit return value cast with (void) and some not.

+       /*
+        * Update the lock variable atomically first without having to acquire wait
+        * list lock, so that if anyone looking for the lock will have chance to
+        * grab it a bit quickly.
+        *
+        * NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+        * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is
+        * a full barrier, we're guaranteed that the subsequent atomic read of lock
+        * state to check if it has any waiters happens after we set the lock
+        * variable to new value here. Without a barrier, we could end up missing
+        * waiters that otherwise should have been woken up.
+        */
+       pg_atomic_exchange_u64(valptr, val);
+
+       /*
+        * Quick exit when there are no waiters. This avoids unnecessary lwlock's
+        * wait list lock acquisition and release.
+        */
+       if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0)
+               return;

Yes, a waiter right after self-queuing (LWLockQueueSelf) checks for
the value (LWLockConflictsWithVar) before it goes and waits until
awakened in LWLockWaitForVar. A waiter added to the queue is
guaranteed to be woken up by the
LWLockUpdateVar but before that the lock value is set and we have
pg_atomic_exchange_u64 as a memory barrier, so no memory reordering.
Essentially, the order of these operations aren't changed. The benefit
that we're seeing is from avoiding LWLock's waitlist lock for reading
and updating the lock value relying on 64-bit atomics.

Overall, I think this patch is in reasonable shape.

Thanks for reviewing. Please see the attached v5 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Bharath Rupireddy (#13)

Re: WAL Insertion Lock Improvements

On Thu, Feb 09, 2023 at 11:51:28AM +0530, Bharath Rupireddy wrote:

On Thu, Feb 9, 2023 at 3:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Overall, I think this patch is in reasonable shape.

Thanks for reviewing. Please see the attached v5 patch.

I'm marking this as ready-for-committer. I think a couple of the comments
could use some small adjustments, but that probably doesn't need to hold up
this patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15

adherent postgres

adherent_postgres@hotmail.com

about 3 years ago

In reply to: Nathan Bossart (#14)

回复: WAL Insertion Lock Improvements

Hi Andres Freund
This patch improves performance significantly,Commitfest 2023-03 is coming to an end,Is it not submitted yet since the patch still needs to be improved?

Best wish
________________________________
发件人: Nathan Bossart <nathandbossart@gmail.com>
发送时间: 2023年2月21日 13:49
收件人: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
抄送: Andres Freund <andres@anarazel.de>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
主题: Re: WAL Insertion Lock Improvements

On Thu, Feb 09, 2023 at 11:51:28AM +0530, Bharath Rupireddy wrote:

On Thu, Feb 9, 2023 at 3:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

Overall, I think this patch is in reasonable shape.

Thanks for reviewing. Please see the attached v5 patch.

I'm marking this as ready-for-committer. I think a couple of the comments
could use some small adjustments, but that probably doesn't need to hold up
this patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: