Improve heavyweight locks instead of building new lock managers?

Started by Andres Freundalmost 6 years ago7 messages

andres@anarazel.de

almost 6 years ago

Hi,

I started this as a reply to
/messages/by-id/CAA4eK1JMgUfiTdAgr9k3nA4cdKvvruousKBg7FWTDNzQgBpOZA@mail.gmail.com
but the email seemed to morph into distinct topic that I thought it
best to separate out.

We've had a number of cases where heavyweight locks turned out to be
too, well, heavy. And I don't mean cases where a fixed number of locks
can do the job. The last case of this is the above thread, where a
separate lock manager just for relation extension was implemented.

My problem with that patch is that it just seems like the wrong
direction architecturally to me. There's two main aspects to this:

1) It basically builds a another, more lightweight but less capable, of
a lock manager that can lock more objects than we can have distinct
locks for. It is faster because it uses *one* hashtable without
conflict handling, because it has fewer lock modes, and because it
doesn't support detecting deadlocks. And probably some other things.

2) A lot of the contention around file extension comes from us doing
multiple expensive things under one lock (determining current
relation size, searching victim buffer, extending file), and in tiny
increments (growing a 1TB table by 8kb). This patch doesn't address
that at all.

I'm only planning to address 1) in this thread, and write a separate
about 2) as part of the above thread. So more on that later.

With regard to 1):

To me this seems to go in the direction of having multiple bespoke lock
managers with slightly different feature sets, different debugging /
logging / monitoring support, with separate locking code each. That's
bad for maintainability.

I think one crucial piece of analysis that I've not seen fully done, is
why a separate lock manager is actually faster. An early (quite
different) version of this patch yielded the following micro-benchmark:

/messages/by-id/CAD21AoD1NenjTmD=5ypOBo9=FRtAtWVxUcHqHxY3wNos_5Bb5w@mail.gmail.com
On 2017-11-21 07:19:30 +0900, Masahiko Sawada wrote:

Also I've done a micro-benchmark; calling LockRelationForExtension and
UnlockRelationForExtension tightly in order to measure the number of
lock/unlock cycles per second. The result is,
PATCHED = 3.95892e+06 (cycles/sec)
HEAD = 1.15284e+06 (cycles/sec)
The patched is 3 times faster than current HEAD.

but I've not actually seen an analysis as to *why* precisely there's a
threefold difference in cost, and whether the problem could instead be
addressed by making the difference much smaller.

My guess would be that the difference, to a very large degree, comes from
avoiding dynahash lookups. We currently have quite a few hashtable
lookups:

* LockRelationForExtension
** LockAcquireExtended does HASH_ENTERs in LockMethodLocalHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodLockHash
** SetupLockInTable does HASH_ENTER_NULL in LockMethodProcLockHash
* UnlockRelationForExtension
** LockRelease does HASH_FIND in LockMethodLocalHash
** CleanUpLock does HASH_REMOVE in LockMethodProcLockHash
** CleanUpLock does HASH_REMOVE in LockMethodLockHash
** RemoveLocalLock does HASH_REMOVE in LockMethodLocalHash

it's pretty easy to believe that a lock mapping like:

+ relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)];

with

+typedef struct RelExtLockTag
+{
+	Oid			dbid;			/* InvalidOid for a shared relation */
+	Oid			relid;
+} RelExtLockTag;
...
+static inline uint32
+RelExtLockTargetTagToIndex(RelExtLockTag *locktag)
+{
+	return tag_hash(locktag, sizeof(RelExtLockTag)) % N_RELEXTLOCK_ENTS;
+}

and then just waiting till that specific hash entry becomes free, is
cheaper than *7* dynahash lookups. Especially if doing so doesn't
require any mapping locks.

but it's not at all clear to me whether we cannot sometimes / always
avoid some of this overhead. Improving lock.c would be a much bigger
win, without building separate infrastructure.

E.g. we could:

- Have an extended LockAcquire API where the caller provides a stack
variable for caching information until the LockRelease call, avoiding
separate LockMethodLocalHash lookup for release.

- Instead of always implementing HASH_REMOVE as a completely fresh
lookup in dynahash, we should provide an API for removing an object we
*know* is in the hashtable at a specific pointer. We're obviously
already relying on that address to stay constant, so we'd not loose
flexibility. With this separate API we can avoid the bucket lookup,
walking through each element in that bucket, and having to compare the
hash key every time (which is quite expensive for a full LOCKTAG).

For the biggest benefit, we'd have to make the bucket list doubly
linked, but even if we were to just go through from start, and just
compare entries by pointer, we'd win big.

- We should try to figure out whether we can avoid needing the separate
lock table for PROCLOCKs. As far as I can tell there's not a single
reason for it to be a hashtable, rather than something like
NUM_LOCK_PARTITIONS lists of free PROCLOCKs.

- Whenever we do a relation extension, we better already hold a normal
relation lock. We don't actually need to have an entirely distinct
type of lock for extensions, that theoretically could support N
extension locks for each relation. The fact that these are associated
could be utilized in different ways:

- We could define relation extension as a separate lock level only
conflicting with itself (and perhaps treat AELs as including
it). That'd avoid needing separate shared memory states in many
cases.

This might be too problematic, because extension locks don't have
the same desired behaviour around group locking (and a small army of
other issues).

- We could keep a separate extension lock cached inside the relation
lock. The first time a transaction needs to extend, it does the
normal work, and after that stores the PROCLOCK in the LOCALLOCK (or
something like that). Once extension is done, don't release the lock
entirely, but instead just reduce the lock level to a new
non-conflicting lock level

Alternatively we could implement something very similar outside of
lock.c, e.g. by caching the LOCALLOCK* (possibly identified by an
integer or such) in RelationData. That'd require enough support
from lock.c to be able to make that really cheap.

The second big difference between lock.c and the proposed relation
extension lock is that it doesn't have a "mapping" lock. It does instead
solve the the mapping without a lock by having exactly one potential
location for each lock, using atomic ops to manipulate the lock state,
and deals with conflicts by waiting for the bucket to become free.

I don't think it's realistic to not use locks to protect the lock
mapping hashtable, nor does it seem likely we can make the manipulation
of individual lock states entirely atomic. But it very well seems
possible to reduce the likelihood of contention:

We could e.g. split the maintenance of the hashtable(s) from protecting
the state of individual locks. The locks protecting the hashtable would
just be held long enough to change a "pin count" of each lock, and then
a per LOCK lwlock would protect each heavyweight lock's state. We'd not
need to lock the partition locks for quite a few cases, because there's
many locks in a loaded system that always have lockers. There'd be cost
to that, needing more atomic ops in some cases, but I think it'd reduce
contention tremendously, and it'd open a lot more optimization
potential. It seems plausible that we could even work, as a followup,
to not needing the partition locks for some lock releases (e.g. when
there are other holders), and we might even be able to avoid it for
acquisitions, by caching the LOCK inside LOCALLOCK, and re-identifying
the identity.

Thoughts?

Greetings,

Andres Freund