Latches vs lwlock contention

Started by Thomas Munroover 3 years ago12 messageshackers

thomas.munro@gmail.com

over 3 years ago

Hi,

We usually want to release lwlocks, and definitely spinlocks, before
calling SetLatch(), to avoid putting a system call into the locked
region so that we minimise the time held. There are a few places
where we don't do that, possibly because it's not just a simple latch
to hold a pointer to but rather a set of them that needs to be
collected from some data structure and we don't have infrastructure to
help with that. There are also cases where we semi-reliably create
lock contention, because the backends that wake up immediately try to
acquire the very same lock.

One example is heavyweight lock wakeups. If you run BEGIN; LOCK TABLE
t; ... and then N other sessions wait in SELECT * FROM t;, and then
you run ... COMMIT;, you'll see the first session wake all the others
while it still holds the partition lock itself. They'll all wake up
and begin to re-acquire the same partition lock in exclusive mode,
immediately go back to sleep on *that* wait list, and then wake each
other up one at a time in a chain. We could avoid the first
double-bounce by not setting the latches until after we've released
the partition lock. We could avoid the rest of them by not
re-acquiring the partition lock at all, which ... if I'm reading right
... shouldn't actually be necessary in modern PostgreSQL? Or if there
is another reason to re-acquire then maybe the comment should be
updated.

Presumably no one really does that repeatedly while there is a long
queue of non-conflicting waiters, so I'm not claiming it's a major
improvement, but it's at least a micro-optimisation.

There are some other simpler mechanical changes including synchronous
replication, SERIALIZABLE DEFERRABLE and condition variables (this one
inspired by Yura Sokolov's patches[1]/messages/by-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel@postgrespro.ru). Actually I'm not at all sure
about the CV implementation, I feel like a more ambitious change is
needed to make our CVs perform.

See attached sketch patches. I guess the main thing that may not be
good enough is the use of a fixed sized latch buffer. Memory
allocation in don't-throw-here environments like the guts of lock code
might be an issue, which is why it just gives up and flushes when
full; maybe it should try to allocate and fall back to flushing only
if that fails. These sketch patches aren't proposals, just
observations in need of more study.

[1]: /messages/by-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel@postgrespro.ru

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Thomas Munro (#1)

Re: Latches vs lwlock contention

On Fri, Oct 28, 2022 at 4:56 PM Thomas Munro <thomas.munro@gmail.com> wrote:

See attached sketch patches. I guess the main thing that may not be
good enough is the use of a fixed sized latch buffer. Memory
allocation in don't-throw-here environments like the guts of lock code
might be an issue, which is why it just gives up and flushes when
full; maybe it should try to allocate and fall back to flushing only
if that fails.

Here's an attempt at that. There aren't actually any cases of uses of
this stuff in critical sections here, so perhaps I shouldn't bother
with that part. The part I'd most like some feedback on is the
heavyweight lock bits. I'll add this to the commitfest.

vignesh C

vignesh21@gmail.com

about 3 years ago

In reply to: Thomas Munro (#2)

Re: Latches vs lwlock contention

On Tue, 1 Nov 2022 at 16:40, Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Oct 28, 2022 at 4:56 PM Thomas Munro <thomas.munro@gmail.com> wrote:

See attached sketch patches. I guess the main thing that may not be
good enough is the use of a fixed sized latch buffer. Memory
allocation in don't-throw-here environments like the guts of lock code
might be an issue, which is why it just gives up and flushes when
full; maybe it should try to allocate and fall back to flushing only
if that fails.

Here's an attempt at that. There aren't actually any cases of uses of
this stuff in critical sections here, so perhaps I shouldn't bother
with that part. The part I'd most like some feedback on is the
heavyweight lock bits. I'll add this to the commitfest.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3998.log, please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
456fa635a909ee36f73ca84d340521bd730f265f ===
=== applying patch ./v2-0003-Use-SetLatches-for-condition-variables.patch
patching file src/backend/storage/lmgr/condition_variable.c
patching file src/backend/storage/lmgr/lwlock.c
Hunk #1 FAILED at 183.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/storage/lmgr/lwlock.c.rej
patching file src/include/storage/condition_variable.h
patching file src/include/storage/lwlock.h
Hunk #1 FAILED at 193.
1 out of 1 hunk FAILED -- saving rejects to file
src/include/storage/lwlock.h.rej

[1]: http://cfbot.cputube.org/patch_41_3998.log

Regards,
Vignesh

Thomas Munro

thomas.munro@gmail.com

about 3 years ago

In reply to: vignesh C (#3)

Re: Latches vs lwlock contention

On Sat, Jan 28, 2023 at 3:39 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 1 Nov 2022 at 16:40, Thomas Munro <thomas.munro@gmail.com> wrote:

Here's an attempt at that. There aren't actually any cases of uses of
this stuff in critical sections here, so perhaps I shouldn't bother
with that part. The part I'd most like some feedback on is the
heavyweight lock bits. I'll add this to the commitfest.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

Rebased. I dropped the CV patch for now.

Peter Eisentraut

peter_e@gmx.net

almost 3 years ago

In reply to: Thomas Munro (#4)

Re: Latches vs lwlock contention

On 04.03.23 20:50, Thomas Munro wrote:

Subject: [PATCH v3 1/6] Allow palloc_extended(NO_OOM) in critical sections.

Commit 4a170ee9e0e banned palloc() and similar in critical sections, because an
allocation failure would produce a panic. Make an exception for allocation
with NULL on failure, for code that has a backup plan.

I suppose this assumes that out of memory is the only possible error
condition that we are concerned about for this?

For example, we sometimes see "invalid memory alloc request size" either
because of corrupted data or because code does things we didn't expect.
This would then possibly panic? Also, the realloc code paths
potentially do more work with possibly more error conditions, and/or
they error out right away because it's not supported by the context type.

Maybe this is all ok, but it would be good to make the assumptions more
explicit.

Aleksander Alekseev

aleksander@timescale.com

almost 3 years ago

In reply to: Peter Eisentraut (#5)

Re: Latches vs lwlock contention

Hi,

Maybe this is all ok, but it would be good to make the assumptions more
explicit.

Here are my two cents.

```
static void
SetLatchV(Latch **latches, int nlatches)
{
/* Flush any other changes out to main memory just once. */
pg_memory_barrier();

/* Keep only latches that are not already set, and set them. */
for (int i = 0; i < nlatches; ++i)
{
Latch *latch = latches[i];

if (!latch->is_set)
latch->is_set = true;
else
latches[i] = NULL;
}

pg_memory_barrier();

[...]

void
SetLatches(LatchGroup *group)
{
if (group->size > 0)
{
SetLatchV(group->latches, group->size);

[...]
```

I suspect this API may be error-prone without some additional
comments. The caller (which may be an extension author too) may rely
on the implementation details of SetLatches() / SetLatchV() and use
the returned group->latches[] values e.g. to figure out whether he
attempted to change the state of the given latch. Even worse, one can
mistakenly assume that the result says exactly if the caller was the
one who changed the state of the latch. This being said I see why this
particular implementation was chosen.

I added corresponding comments to SetLatchV() and SetLatches(). Also
the patchset needed a rebase. PFA v4.

It passes `make installcheck-world` on 3 machines of mine: MacOS x64,
Linux x64 and Linux RISC-V.

--
Best regards,
Aleksander Alekseev

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 2 years ago

In reply to: Thomas Munro (#1)

Re: Latches vs lwlock contention

On 28/10/2022 06:56, Thomas Munro wrote:

One example is heavyweight lock wakeups. If you run BEGIN; LOCK TABLE
t; ... and then N other sessions wait in SELECT * FROM t;, and then
you run ... COMMIT;, you'll see the first session wake all the others
while it still holds the partition lock itself. They'll all wake up
and begin to re-acquire the same partition lock in exclusive mode,
immediately go back to sleep on*that* wait list, and then wake each
other up one at a time in a chain. We could avoid the first
double-bounce by not setting the latches until after we've released
the partition lock. We could avoid the rest of them by not
re-acquiring the partition lock at all, which ... if I'm reading right
... shouldn't actually be necessary in modern PostgreSQL? Or if there
is another reason to re-acquire then maybe the comment should be
updated.

ISTM that the change to not re-aqcuire the lock in ProcSleep is
independent from the other changes. Let's split that off to a separate
patch.

I agree it should be safe. Acquiring a lock just to hold off interrupts
is overkill anwyway, HOLD_INTERRUPTS() would be enough.
LockErrorCleanup() uses HOLD_INTERRUPTS() already.

There are no CHECK_FOR_INTERRUPTS() in GrantAwaitedLock(), so cancel/die
interrupts can't happen here. But could we add HOLD_INTERRUPTS(), just
pro forma, to document the assumption? It's a little awkward: you really
should hold interrupts until the caller has done "awaitedLock = NULL;".
So it's not quite enough to add a pair of HOLD_ and RESUME_INTERRUPTS()
at the end of ProcSleep(). You'd need to do the HOLD_INTERRUPTS() in
ProcSleep() and require the caller to do RESUME_INTERRUPTS(). In a
sense, ProcSleep downgrades the lock on the partition to just holding
off interrupts.

Overall +1 on this change to not re-acquire the partition lock.

--
Heikki Linnakangas
Neon (https://neon.tech)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 1 year ago

In reply to: Heikki Linnakangas (#7)

Re: Latches vs lwlock contention

On 28/09/2023 12:58, Heikki Linnakangas wrote:

On 28/10/2022 06:56, Thomas Munro wrote:

One example is heavyweight lock wakeups. If you run BEGIN; LOCK TABLE
t; ... and then N other sessions wait in SELECT * FROM t;, and then
you run ... COMMIT;, you'll see the first session wake all the others
while it still holds the partition lock itself. They'll all wake up
and begin to re-acquire the same partition lock in exclusive mode,
immediately go back to sleep on*that* wait list, and then wake each
other up one at a time in a chain. We could avoid the first
double-bounce by not setting the latches until after we've released
the partition lock. We could avoid the rest of them by not
re-acquiring the partition lock at all, which ... if I'm reading right
... shouldn't actually be necessary in modern PostgreSQL? Or if there
is another reason to re-acquire then maybe the comment should be
updated.

ISTM that the change to not re-aqcuire the lock in ProcSleep is
independent from the other changes. Let's split that off to a separate
patch.

I spent some time on splitting that off. I had to start from scratch,
because commit 2346df6fc373df9c5ab944eebecf7d3036d727de conflicted
heavily with your patch.

I split ProcSleep() into two functions: JoinWaitQueue does the first
part of ProcSleep(), adding the process to the wait queue and checking
for the dontWait and early deadlock cases. What remains in ProcSleep()
does just the sleeping part. JoinWaitQueue is called with the partition
lock held, and ProcSleep() is called without it. This way, the partition
lock is acquired and released in the same function
(LockAcquireExtended), avoiding awkward "lock is held on enter, but
might be released on exit depending on the outcome" logic.

This is actually a set of 8 patches. The first 7 are independent tiny
fixes and refactorings in these functions. See individual commit messages.

I agree it should be safe. Acquiring a lock just to hold off interrupts
is overkill anwyway, HOLD_INTERRUPTS() would be enough.
LockErrorCleanup() uses HOLD_INTERRUPTS() already.

There are no CHECK_FOR_INTERRUPTS() in GrantAwaitedLock(), so cancel/die
interrupts can't happen here. But could we add HOLD_INTERRUPTS(), just
pro forma, to document the assumption? It's a little awkward: you really
should hold interrupts until the caller has done "awaitedLock = NULL;".
So it's not quite enough to add a pair of HOLD_ and RESUME_INTERRUPTS()
at the end of ProcSleep(). You'd need to do the HOLD_INTERRUPTS() in
ProcSleep() and require the caller to do RESUME_INTERRUPTS(). In a
sense, ProcSleep downgrades the lock on the partition to just holding
off interrupts.

I didn't use HOLD/RESUME_INTERRUPTS() after all. Like you did, I'm just
relying on the fact that there are no CHECK_FOR_INTERRUPTS() calls in
places where they might cause trouble. Those sections are short, so I
think it's fine.

--
Heikki Linnakangas
Neon (https://neon.tech)

Maxim Orlov

orlovmg@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#8)

Re: Latches vs lwlock contention

I looked at the patch set and found it quite useful.

The first 7 patches are just refactoring and may be committed separately if
needed.
There were minor problems: patch #5 don't want to apply clearly and the #8
is complained
about partitionLock is unused if we build without asserts. So, I add a
PG_USED_FOR_ASSERTS_ONLY
to solve the last issue.

Again, overall patch looks good and seems useful to me. Here is the rebased
v5 version based on Heikki's patch set above.

--
Best regards,
Maxim Orlov.

#10

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 1 year ago

In reply to: Maxim Orlov (#9)

Re: Latches vs lwlock contention

On 10/09/2024 19:53, Maxim Orlov wrote:

I looked at the patch set and found it quite useful.

The first 7 patches are just refactoring and may be committed separately
if needed.
There were minor problems: patch #5 don't want to apply clearly and the
#8 is complained
about partitionLock is unused if we build without asserts. So, I add a
PG_USED_FOR_ASSERTS_ONLY
to solve the last issue.

Again, overall patch looks good and seems useful to me. Here is the
rebased v5 version based on Heikki's patch set above.

Committed, thanks for the review!

In case you're wondering, I committed some of the smaller patches
separately, but also squashed some of them with the main patch, On
closer look, the first patch, "Remove LOCK_PRINT() call that could point
to garbage", wasn't fixing any existing issue. The LOCK_PRINT() was
fine, because we held the partition lock. But it became necessary with
the main patch, so I squashed it with that. And the others that I
squashed were just not that interesting on their own.

The rest of Thomas's SetLatches work remains, so I left the commitfest
entry in "Needs review" state.

--
Heikki Linnakangas
Neon (https://neon.tech)

#11

Alexander Lakhin

exclusion@gmail.com

about 1 year ago

In reply to: Heikki Linnakangas (#10)

Re: Latches vs lwlock contention

Hello Heikki,

04.11.2024 18:08, Heikki Linnakangas wrote:

Committed, thanks for the review!

I've discovered that the following script:
export PGOPTIONS='-c lock_timeout=1s'
createdb regression
for i in {1..100}; do
echo "ITERATION: $i"
psql -c "CREATE TABLE t(i int);"
cat << 'EOF' | psql &
DO $$
DECLARE
    i int;
BEGIN
   FOR i IN 1 .. 5000000 LOOP
    INSERT INTO t VALUES (1);
END LOOP;
END;
$$;
EOF
sleep 1
psql -c "DROP TABLE t" &
cat << 'EOF' | psql &
COPY t FROM STDIN;
0
\.
EOF
wait

psql -c "DROP TABLE t" || break;
done

causes a segmentation fault on master (it fails on iterations 5, 4, 26 for me):
ITERATION: 26
CREATE TABLE
ERROR: canceling statement due to lock timeout
ERROR: canceling statement due to lock timeout
invalid command \.
ERROR: syntax error at or near "0"
LINE 1: 0
^
server closed the connection unexpectedly

Core was generated by `postgres: law regression [local] idle '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at lock.c:1805
1805 lockOwners[i].owner = owner;
(gdb) bt
#0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at lock.c:1805
#1 0x00005a1d51e93ee7 in GrantAwaitedLock () at lock.c:1887
#2 0x00005a1d51ea1e58 in LockErrorCleanup () at proc.c:814
#3 0x00005a1d51b9a1a7 in AbortTransaction () at xact.c:2853
#4 0x00005a1d51b9abc6 in AbortCurrentTransactionInternal () at xact.c:3579
#5 AbortCurrentTransaction () at xact.c:3457
#6 0x00005a1d51eafeda in PostgresMain (dbname=<optimized out>, username=0x5a1d75c139b8 "law") at postgres.c:4440

(gdb) p lockOwners
$1 = (LOCALLOCKOWNER *) 0x0

git bisect led me to 3c0fd64fe.
Could you please take a look?

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 1 year ago

In reply to: Alexander Lakhin (#11)

Re: Latches vs lwlock contention

On 27/03/2025 07:00, Alexander Lakhin wrote:

I've discovered that the following script:
export PGOPTIONS='-c lock_timeout=1s'
createdb regression
for i in {1..100}; do
echo "ITERATION: $i"
psql -c "CREATE TABLE t(i int);"
cat << 'EOF' | psql &
DO $$
DECLARE
    i int;
BEGIN
   FOR i IN 1 .. 5000000 LOOP
    INSERT INTO t VALUES (1);
END LOOP;
END;
$$;
EOF
sleep 1
psql -c "DROP TABLE t" &
cat << 'EOF' | psql &
COPY t FROM STDIN;
0
\.
EOF
wait

psql -c "DROP TABLE t" || break;
done

causes a segmentation fault on master (it fails on iterations 5, 4, 26
for me):
ITERATION: 26
CREATE TABLE
ERROR: canceling statement due to lock timeout
ERROR: canceling statement due to lock timeout
invalid command \.
ERROR: syntax error at or near "0"
LINE 1: 0
        ^
server closed the connection unexpectedly

Core was generated by `postgres: law regression [local]
idle                                         '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at
lock.c:1805
1805            lockOwners[i].owner = owner;
(gdb) bt
#0 GrantLockLocal (locallock=0x5a1d75c35ba8, owner=0x5a1d75c18630) at
lock.c:1805
#1 0x00005a1d51e93ee7 in GrantAwaitedLock () at lock.c:1887
#2 0x00005a1d51ea1e58 in LockErrorCleanup () at proc.c:814
#3 0x00005a1d51b9a1a7 in AbortTransaction () at xact.c:2853
#4 0x00005a1d51b9abc6 in AbortCurrentTransactionInternal () at xact.c:3579
#5 AbortCurrentTransaction () at xact.c:3457
#6 0x00005a1d51eafeda in PostgresMain (dbname=<optimized out>,
username=0x5a1d75c139b8 "law") at postgres.c:4440

(gdb) p lockOwners
$1 = (LOCALLOCKOWNER *) 0x0

git bisect led me to 3c0fd64fe.
Could you please take a look?

Great, thanks for the repro! With that, I was able to capture the
failure with 'rr' and understand what happens: Commit 3c0fd64fe removed
"lockAwaited = NULL;" from LockErrorCleanup(). Because of that, if the
lock had been granted to us, and if LockErrorCleanup() was called twice,
the second call would call GrantAwaitedLock() even if the lock was
already released and cleaned up.

I've pushed a fix to put that back.

--
Heikki Linnakangas
Neon (https://neon.tech)

Latches vs lwlock contention

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: