Refactor PROCLOCK hash table into partitioned list allocator
Hi hackers,
Following up on the Discord discussion about the PROCLOCK hash table being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.
The current implementation uses LockMethodProcLockHash purely as an allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).
The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
and FastPathGetRelationLockEntry())
The last point bothers me most. It seems like this traversals are expected to be short.
But I'm not 100% sure.
This patch removes the proclock_hash() function and ProcLockHashCode() entirely, and
eliminates all hash_search() calls for PROCLOCKs. The allocation/deallocation
is now just dlist operations instead of hash table management.
Would appreciate your thoughts on this approach. Is this the direction worth working on?
Best regards, Andrey Borodin.
Attachments:
v1-0001-Replace-PROCLOCK-hash-table-with-partitioned-free.patchapplication/octet-stream; name=v1-0001-Replace-PROCLOCK-hash-table-with-partitioned-free.patch; x-unix-mode=0644Download+325-317
On 30/12/2025 14:37, Andrey Borodin wrote:
Hi hackers,
Following up on the Discord discussion about the PROCLOCK hash table being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.The current implementation uses LockMethodProcLockHash purely as an allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
and FastPathGetRelationLockEntry())The last point bothers me most. It seems like this traversals are expected to be short.
But I'm not 100% sure.
Hmm, yeah the last point contradicts the premise that the hash table is
used purely as an allocator. It *is* used for lookups, and you're
replacing them with linear scans. That doesn't seem like an improvement.
- Heikki
On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 30/12/2025 14:37, Andrey Borodin wrote:
Hi hackers,
Following up on the Discord discussion about the PROCLOCK hash table being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.The current implementation uses LockMethodProcLockHash purely as an allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
and FastPathGetRelationLockEntry())The last point bothers me most. It seems like this traversals are expected to be short.
But I'm not 100% sure.Hmm, yeah the last point contradicts the premise that the hash table is
used purely as an allocator. It *is* used for lookups, and you're
replacing them with linear scans. That doesn't seem like an improvement.
There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and
FastPathGetRelationLockEntry:
- An active backend's PROCLOCK entries (in both LRAR and FPGRLE).
- Prepared transaction's PROCLOCK entries (only in LRAR, called from
lock_twophase_postcommit).
For the backend's PROCLOCK entry lookup, we can use a backend-local
hash table, which only keeps track of where this backend's entries
are.
For prepared transactions, I don't see any code that would indicate
more than one lock being released through this code
(lock_twophase_postcommit only releases one lock), which to me
indicates there is no risk of O(N^2)-related performance sinks. In the
case that there are more locks in the 2PC's PROCLOCK list, we could
"just" make sure to put the lock we're releasing in transaction wind
down at the head of the list; as that would also keep the lookup O(1).
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
On Tue, 06 Jan 2026 at 16:23, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 30/12/2025 14:37, Andrey Borodin wrote:
Hi hackers,
Following up on the Discord discussion about the PROCLOCK hash table
being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.
The current implementation uses LockMethodProcLockHash purely as an
allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).
The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
and FastPathGetRelationLockEntry())
The last point bothers me most. It seems like this traversals are
expected to be short.
But I'm not 100% sure.Hmm, yeah the last point contradicts the premise that the hash table
is used purely as an allocator. It *is* used for lookups, and you're
replacing them with linear scans. That doesn't seem like an
improvement.- Heikki
I tested the patch on a Loongson 3C6000/D system with 128 vCPUs using
BenchmarkSQL 5.0 (100 warehouses, 100 clients).
Here are the results:
| | tpmC | tpmTotal |
|---------|-----------|-----------|
| master | 248199.09 | 551387.46 |
| | 243660.35 | 541902.31 |
| | 244418.30 | 542867.57 |
| patched | 247330.65 | 549949.25 |
| | 242953.79 | 539620.65 |
| | 237883.19 | 528491.66 |
Not sure if this is useful, but throwing it out there.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
On 06/01/2026 16:58, Matthias van de Meent wrote:
On Tue, 6 Jan 2026 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 30/12/2025 14:37, Andrey Borodin wrote:
Hi hackers,
Following up on the Discord discussion about the PROCLOCK hash table being
a "weird allocator" that we never actually use for lookups - I took a stab at
replacing it with a simpler partitioned free list approach as was suggested.
I was doing this mostly to educate myself on Lock Manager internals.The current implementation uses LockMethodProcLockHash purely as an allocator.
We never do hash lookups by key; we only allocate entries, link them to the lock's
procLocks list, and free them later. Using a full hash table for this adds
unnecessary complexity and maybe even overhead (I did not measure this).The attached patch replaces this with:
- ProcLockArray: A fixed-size array of all PROCLOCK structs (allocated at startup)
- ProcLockFreeList: Partitioned free lists, one per lock partition to reduce contention
- ProcLockAlloc/Free: Simple push/pop operations on the free lists
- PROCLOCK lookup: Linear traversal of lock->procLocks (see LockRefindAndRelease()
and FastPathGetRelationLockEntry())The last point bothers me most. It seems like this traversals are expected to be short.
But I'm not 100% sure.Hmm, yeah the last point contradicts the premise that the hash table is
used purely as an allocator. It *is* used for lookups, and you're
replacing them with linear scans. That doesn't seem like an improvement.There are 2 types of PROCLOCK lookup used in LockRefindAndRelease and
FastPathGetRelationLockEntry:
- An active backend's PROCLOCK entries (in both LRAR and FPGRLE).
- Prepared transaction's PROCLOCK entries (only in LRAR, called from
lock_twophase_postcommit).
There are also lookups in SetupLockInTable and in LockRelease.
For the backend's PROCLOCK entry lookup, we can use a backend-local
hash table, which only keeps track of where this backend's entries
are.
Hmm, good point. In fact we already have that: there's a pointer to the
current process's PROCLOCK entry in LOCALLOCK already. Can we use that
to avoid the linear scans? There's this LockAcquireExtended:
/*
* Find or create lock and proclock entries with this tag
*
* Note: if the locallock object already existed, it might have a pointer
* to the lock already ... but we should not assume that that pointer is
* valid, since a lock object with zero hold and request counts can go
* away anytime. So we have to use SetupLockInTable() to recompute the
* lock and proclock pointers, even if they're already set.
*/
proclock = SetupLockInTable(lockMethodTable, MyProc, locktag,
hashcode, lockmode);
So that comment suggests that the 'proclock' pointer cannot be trusted
here. I don't remember how all this works, so I'm not sure if that is a
show-stopper or if we could somehow leverage the 'proclock' pointer in
LOCALLOCK anyway.
- Heikki