pgsql: Improve performance of subsystems on top of SLRU
Improve performance of subsystems on top of SLRU
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches. In order for this to work with good
performance, two additional changes are made:
1. the cache is divided in "banks" (to borrow terminology from CPU
caches), and algorithms such as eviction buffer search only affect
one specific bank. This forestalls the problem that linear searching
for a specific buffer across the whole cache takes too long: we only
have to search the specific bank, whose size is small. This work is
authored by Andrey Borodin.
2. Change the locking regime for the SLRU banks, so that each bank uses
a separate LWLock. This allows for increased scalability. This work
is authored by Dilip Kumar. (A part of this was previously committed as
d172b717c6f4.)
Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next. This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this. I (Álvaro) also added
lots of comments to make sure the design is sound.
The new GUCs match the names introduced by bcdfa5f2e2f2 in the
pg_stat_slru view.
The default values for these parameters are similar to the previous
sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB. (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.) The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0. These values based on
shared_buffers may need to be revisited, but that's an easy change.
There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally). However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: /messages/by-id/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: /messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/53c2a97a92665be6bd7d70bd62ae6158fe4db96e
Modified Files
--------------
doc/src/sgml/config.sgml | 139 +++++++++
doc/src/sgml/monitoring.sgml | 9 +-
src/backend/access/transam/clog.c | 243 +++++++++++-----
src/backend/access/transam/commit_ts.c | 88 ++++--
src/backend/access/transam/multixact.c | 190 +++++++++----
src/backend/access/transam/slru.c | 357 +++++++++++++++++-------
src/backend/access/transam/subtrans.c | 110 ++++++--
src/backend/commands/async.c | 61 ++--
src/backend/storage/lmgr/lwlock.c | 9 +-
src/backend/storage/lmgr/lwlocknames.txt | 14 +-
src/backend/storage/lmgr/predicate.c | 34 ++-
src/backend/utils/activity/wait_event_names.txt | 15 +-
src/backend/utils/init/globals.c | 9 +
src/backend/utils/misc/guc_tables.c | 78 ++++++
src/backend/utils/misc/postgresql.conf.sample | 9 +
src/include/access/clog.h | 1 -
src/include/access/commit_ts.h | 1 -
src/include/access/multixact.h | 4 -
src/include/access/slru.h | 86 ++++--
src/include/access/subtrans.h | 3 -
src/include/commands/async.h | 5 -
src/include/miscadmin.h | 8 +
src/include/storage/lwlock.h | 7 +
src/include/storage/predicate.h | 4 -
src/include/utils/guc_hooks.h | 11 +
src/test/modules/test_slru/test_slru.c | 35 +--
26 files changed, 1177 insertions(+), 353 deletions(-)
On 2024-Feb-28, Alvaro Herrera wrote:
Improve performance of subsystems on top of SLRU
Coverity had the following complaint about this commit:
________________________________________________________________________________________________________
*** CID NNNNNNN: Control flow issues (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers()
1369 * and acquire the lock of the new bank.
1370 */
1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
1372 if (lock != prevlock)
1373 {
1374 if (prevlock != NULL)
CID 1592913: Control flow issues (DEADCODE)
Execution cannot reach this statement: "LWLockRelease(prevlock);".
1375 LWLockRelease(prevlock);
1376 LWLockAcquire(lock, LW_EXCLUSIVE);
1377 prevlock = lock;
1378 }
1379
1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.
I'll have a look at the other places where we use this "prevlock" coding
pattern tomorrow.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachments:
0001-rework-locking-code-in-GetMultiXactIdMembers.patchtext/x-diff; charset=utf-8Download+20-33
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.
This is certainly simpler, but I notice that it holds the current
LWLock across the line
ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
where the old code did not. Could the palloc take long enough that
holding the lock is bad?
Also, with this coding the "lock = NULL;" assignment just before
"goto retry" is a dead store. Not sure if Coverity or other static
analyzers would whine about that.
regards, tom lane
On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2024-Feb-28, Alvaro Herrera wrote:
Improve performance of subsystems on top of SLRU
Coverity had the following complaint about this commit:
________________________________________________________________________________________________________
*** CID NNNNNNN: Control flow issues (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers()
1369 * and acquire the lock of the new bank.
1370 */
1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
1372 if (lock != prevlock)
1373 {
1374 if (prevlock != NULL)CID 1592913: Control flow issues (DEADCODE)
Execution cannot reach this statement: "LWLockRelease(prevlock);".1375 LWLockRelease(prevlock);
1376 LWLockAcquire(lock, LW_EXCLUSIVE);
1377 prevlock = lock;
1378 }
1379
1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.I'll have a look at the other places where we use this "prevlock" coding
pattern tomorrow.
+ /* Acquire the bank lock for the page we need. */
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
- if (lock != prevlock)
- {
- if (prevlock != NULL)
- LWLockRelease(prevlock);
- LWLockAcquire(lock, LW_EXCLUSIVE);
- prevlock = lock;
- }
+ LWLockAcquire(lock, LW_EXCLUSIVE);
This part is definitely an improvement.
I am not sure about the other changes, I mean that makes the code much
simpler but now we are not releasing the 'MultiXactOffsetCtl' related
bank lock, and later in the following loop, we are comparing that lock
against 'MultiXactMemberCtl' related bank lock. This makes code
simpler because now in the loop we are sure that we are always holding
the lock but I do not like comparing the bank locks for 2 different
SLRUs, although there is no problem as there would not be a common
lock address, anyway, I do not have any strong objection to what you
have done here.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On 2024-Mar-03, Tom Lane wrote:
This is certainly simpler, but I notice that it holds the current
LWLock across the lineptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
where the old code did not. Could the palloc take long enough that
holding the lock is bad?
Hmm, I guess most of the time it shouldn't be much of a problem (if the
length is small so we can palloc without malloc'ing); but it could be in
the worst case. But the fix is simple: just release the lock before.
There's no correctness argument for holding it all the way down. I was
just confused about how the original code worked.
Also, with this coding the "lock = NULL;" assignment just before
"goto retry" is a dead store. Not sure if Coverity or other static
analyzers would whine about that.
Oh, right. I removed that.
On 2024-Mar-04, Dilip Kumar wrote:
I am not sure about the other changes, I mean that makes the code much
simpler but now we are not releasing the 'MultiXactOffsetCtl' related
bank lock, and later in the following loop, we are comparing that lock
against 'MultiXactMemberCtl' related bank lock. This makes code
simpler because now in the loop we are sure that we are always holding
the lock but I do not like comparing the bank locks for 2 different
SLRUs, although there is no problem as there would not be a common
lock address,
True. This can be addressed in the same way Tom's first comment is:
just release the lock before entering the second loop, and setting lock
to NULL. This brings the code to a similar state than before, except
that the additional LWLock * variables are in a tighter scope. That's
in 0001.
Now, I had a look at the other users of slru.c and noticed in subtrans.c
that StartupSUBTRANS we have some duplicate code that I think could be
rewritten by making the "while" block test the condition at the end
instead of at the start; changed that in 0002. I'll leave this one for
later, because I want to add some test code for it -- right now it's
pretty much test-uncovered code.
I also looked at slru.c for uses of shared->bank_locks and noticed a
couple that could be made simpler. That's 0003 here.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)
Import Notes
Reply to msg id not found: CAFiTN-sAPL1iCBfUOj1VimVBngO_0HyDNEgShWKL_-6RqbNDLg@mail.gmail.com3396247.1709500489@sss.pgh.pa.us | Resolved by subject fallback
FWIW there's a stupid bug in 0002, which is fixed here. I'm writing a
simple test for it.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."