scalability bottlenecks with (many) partitions (and more)

Started by Tomas Vondraalmost 2 years ago61 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com
5 attachment(s)

Hi,

I happened to investigate a query involving a partitioned table, which
led me to a couple of bottlenecks severely affecting queries dealing
with multiple partitions (or relations in general). After a while I came
up with three WIP patches that improve the behavior by an order of
magnitude, and not just in some extreme cases.

Consider a partitioned pgbench with 20 partitions, say:

pgbench -i -s 100 --partitions 100 testdb

but let's modify the pgbench_accounts a little bit:

ALTER TABLE pgbench_accounts ADD COLUMN aid_parent INT;
UPDATE pgbench_accounts SET aid_parent = aid;
CREATE INDEX ON pgbench_accounts(aid_parent);
VACUUM FULL pgbench_accounts;

which simply adds "aid_parent" column which is not a partition key. And
now let's do a query

SELECT * FROM pgbench_accounts pa JOIN pgbench_branches pb
ON (pa.bid = pb.bid) WHERE pa.aid_parent = :aid

so pretty much the regular "pgbench -S" except that on the column that
does not allow partition elimination. Now, the plan looks like this:

QUERY PLAN
----------------------------------------------------------------------
Hash Join (cost=1.52..34.41 rows=10 width=465)
Hash Cond: (pa.bid = pb.bid)
-> Append (cost=0.29..33.15 rows=10 width=101)
-> Index Scan using pgbench_accounts_1_aid_parent_idx on
pgbench_accounts_1 pa_1 (cost=0.29..3.31 rows=1 width=101)
Index Cond: (aid_parent = 3489734)
-> Index Scan using pgbench_accounts_2_aid_parent_idx on
pgbench_accounts_2 pa_2 (cost=0.29..3.31 rows=1 width=101)
Index Cond: (aid_parent = 3489734)
-> Index Scan using pgbench_accounts_3_aid_parent_idx on
pgbench_accounts_3 pa_3 (cost=0.29..3.31 rows=1 width=101)
Index Cond: (aid_parent = 3489734)
-> Index Scan using pgbench_accounts_4_aid_parent_idx on
pgbench_accounts_4 pa_4 (cost=0.29..3.31 rows=1 width=101)
Index Cond: (aid_parent = 3489734)
-> ...
-> Hash (cost=1.10..1.10 rows=10 width=364)
-> Seq Scan on pgbench_branches pb (cost=0.00..1.10 rows=10
width=364)

So yeah, scanning all 100 partitions. Not great, but no partitioning
scheme is perfect for all queries. Anyway, let's see how this works on a
big AMD EPYC machine with 96/192 cores - with "-M simple" we get:

parts 1 8 16 32 64 96 160 224
-----------------------------------------------------------------------
0 13877 105732 210890 410452 709509 844683 1050658 1163026
100 653 3957 7120 12022 12707 11813 10349 9633
1000 20 142 270 474 757 808 567 427

These are transactions per second, for different number of clients
(numbers in the header). With -M prepared the story doesn't change - the
numbers are higher, but the overall behavior is pretty much the same.

Firstly, with no partitions (first row), the throughput by ~13k/client
initially, then it gradually levels off. But it grows all the time.

But with 100 or 1000 partitions, it peaks and then starts dropping
again. And moreover, the throughput with 100 or 1000 partitions is just
a tiny fraction of the non-partitioned value. The difference is roughly
equal to the number of partitions - for example with 96 clients, the
difference between 0 and 1000 partitions is 844683/808 = 1045.

I could demonstrate the same behavior with fewer partitions - e.g. with
10 partitions you get ~10x difference, and so on.

Another thing I'd mention is that this is not just about partitioning.
Imagine a star schema with a fact table and dimensions - you'll get the
same behavior depending on the number of dimensions you need to join
with. With "-M simple" you may get this, for example:

dims 1 8 16 32 64 96 160 224
----------------------------------------------------------------------
1 11737 92925 183678 361497 636598 768956 958679 1042799
10 462 3558 7086 13889 25367 29503 25353 24030
100 4 31 61 122 231 292 292 288

So, similar story - significant slowdown as we're adding dimensions.

Now, what could be causing this? Clearly, there's a bottleneck of some
kind, and we're hitting it. Some of this may be simply due to execution
doing more stuff (more index scans, more initialization, ...) but maybe
not - one of the reasons why I started looking into this was not using
all the CPU even for small scales - the CPU was maybe 60% utilized.

So I started poking at things. The first thing that I thought about was
locking, obviously. That's consistent with the limited CPU utilization
(waiting on a lock = not running), and it's somewhat expected when using
many partitions - we need to lock all of them, and if we have 100 or
1000 of them, that's potentially lot of locks.

From past experiments I've known about two places where such bottleneck
could be - NUM_LOCK_PARTITIONS and fast-path locking. So I decided to
give it a try, increase these values and see what happens.

For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The
LWLock table has 16 partitions by default - it's quite possible that on
machine with many cores and/or many partitions, we can easily hit this.
So I bumped this 4x to 64 partitions.

For fast-path locking the changes are more complicated (see 0002). We
allow keeping 16 relation locks right in PGPROC, and only when this gets
full we promote them to the actual lock table. But with enough
partitions we're guaranteed to fill these 16 slots, of course. But
increasing the number of slots is not simple - firstly, the information
is split between an array of 16 OIDs and UINT64 serving as a bitmap.
Increasing the size of the OID array is simple, but it's harder for the
auxiliary bitmap. But there's more problems - with more OIDs a simple
linear search won't do. But a simple hash table is not a good idea too,
because of poor locality and the need to delete stuff ...

What I ended up doing is having a hash table of 16-element arrays. There
are 64 "pieces", each essentially the (16 x OID + UINT64 bitmap) that we
have now. Each OID is mapped to exactly one of these parts as if in a
hash table, and in each of those 16-element parts we do exactly the same
thing we do now (linear search, removal, etc.). This works great, the
locality is great, etc. The one disadvantage is this makes PGPROC
larger, but I did a lot of benchmarks and I haven't seen any regression
that I could attribute to this. (More about this later.)

Unfortunately, for the pgbench join this does not make much difference.
But for the "star join" (with -M prepared) it does this:

1 8 16 32 64 96 160 224
------------------------------------------------------------------------
master 21610 137450 247541 300902 270932 229692 191454 189233
patched 21664 151695 301451 594615 1036424 1211716 1480953 1656203
speedup 1.0 1.1 1.2 2.0 3.8 5.3 7.7 8.8

That's a pretty nice speedup, I think.

However, why doesn't the partitioned join improve (at not very much)?
Well, perf profile says stuff like this:

9.16% 0.77% postgres [kernel.kallsyms] [k] asm_exc_page_fault
|
--8.39%--asm_exc_page_fault
|
--7.52%--exc_page_fault
|
--7.13%--do_user_addr_fault
|
--6.64%--handle_mm_fault
|
--6.29%--__handle_mm_fault
|
|--2.17%--__mem_cgroup_charge
| |
| |--1.25%--charge_memcg
| | |
| | --0.57%-- ...
| |
| --0.67%-- ...
|
|--2.04%--vma_alloc_folio

After investigating this for a bit, I came to the conclusion this may be
some sort of a scalability problem in glibc/malloc. I decided to try if
the "memory pool" patch (which I've mentioned in the memory limit thread
as an alternative way to introduce backend-level accounting/limit) could
serve as a backend-level malloc cache, and how would that work. So I
cleaned up the PoC patch I already had (see 0003), and gave it a try.

And with both patches applied, the results for the partitioned join with
100 partitions look like this:

-M simple

1 8 16 32 64 96 160 224
------------------------------------------------------------------------
master 653 3957 7120 12022 12707 11813 10349 9633
both patches 954 7356 14580 28259 51552 65278 70607 69598
speedup 1.5 1.9 2.0 2.4 4.1 5.5 6.8 7.2

-M prepared

1 8 16 32 64 96 160 224
------------------------------------------------------------------------
master 1639 8273 14138 14746 13446 14001 11129 10136
both patches 4792 30102 62208 122157 220984 267763 315632 323567
speedup 2.9 3.6 4.4 8.3 16.4 19.1 28.4 31.9

That's pretty nice, I think. And I've seen many such improvements, it's
not a cherry-picked example. For the star join, the improvements are
very similar.

I'm attaching PDF files with a table visualizing results for these two
benchmarks - there's results for different number of partitions/scales,
and different builds (master, one or both of the patches). There's also
a comparison to master, with color scale "red = slower, green = faster"
(but there's no red anywhere, not even for low client counts).

It's also interesting that with just the 0003 patch applied, the change
is much smaller. It's as if the two bottlenecks (locking and malloc) are
in balance - if you only address one one, you don't get much. But if you
address both, it flies.

FWIW where does the malloc overhead come from? For one, while we do have
some caching of malloc-ed memory in memory contexts, that doesn't quite
work cross-query, because we destroy the contexts at the end of the
query. We attempt to cache the memory contexts too, but in this case
that can't help because the allocations come from btbeginscan() where we
do this:

so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));

and BTScanOpaqueData is ~27kB, which means it's an oversized chunk and
thus always allocated using a separate malloc() call. Maybe we could
break it into smaller/cacheable parts, but I haven't tried, and I doubt
it's the only such allocation.

I don't want to get into too much detail about the memory pool, but I
think it's something we should consider doing - I'm sure there's stuff
to improve, but caching the malloc may clearly be very beneficial. The
basic idea is to have a cache that is "adaptive" (i.e. adjusts to
caching blocks of sizes needed by the workload) but also cheap. The
patch is PoC/WIP and needs more work, but I think it works quite well.
If anyone wants to take a look or have a chat at FOSDEM, for example,
I'm available.

FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
the behavior a lot - it gets us maybe ~80% of the mempool benefits.
Which is nice, it confirms it's glibc-specific (I wonder if there's a
way to tweak glibc to address this), and it also means systems using
jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
says the mempool has ~20% benefit on top of jemalloc.

FWIW there's another bottleneck people may not realize, and that's the
number of file descriptors. Once you get to >1000 relations, you can
easily get into situation like this:

54.18% 0.48% postgres [kernel.kallsyms] [k]
entry_SYSCALL_64_after_hwframe
|
--53.70%--entry_SYSCALL_64_after_hwframe
|
--53.03%--do_syscall_64
|
|--28.29%--__x64_sys_openat
| |
| --28.14%--do_sys_openat2
| |
| |--23.14%--do_filp_open
| | |
| | --22.72%--path_openat

That's pretty bad, it means we're closing/opening file descriptors like
crazy, because every query needs the files. If I increase the number of
file descriptors (both in ulimit and max_files_per_process) to prevent
this trashing, I can increase the throughput ~5x. Of course, this is not
a bottleneck that we can "fix" in code, it's simply a consequence of not
having enough file descriptors etc. But I wonder if we might make it
easier to monitor this, e.g. by tracking the fd cache hit ratio, or
something like that ...

There's a more complete set of benchmarking scripts and results for
these and other tests, in various formats (PDF, ODS, ...) at

https://github.com/tvondra/scalability-patches

There's results from multiple machines - not just the big epyc machine,
but also smaller intel machines (4C and 16C), and even two rpi5 (yes, it
helps even on rpi5, quite a bit).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

join - epyc _ builds.pdfapplication/pdf; name="join - epyc _ builds.pdf"Download
star - epyc _ builds.pdfapplication/pdf; name="star - epyc _ builds.pdf"Download
v240118-0001-Increase-NUM_LOCK_PARTITIONS-to-64.patchtext/x-patch; charset=UTF-8; name=v240118-0001-Increase-NUM_LOCK_PARTITIONS-to-64.patchDownload
From 98a361f95c2c4969488c2286f8aa560b45f8c0a8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Mon, 8 Jan 2024 00:32:22 +0100
Subject: [PATCH v240118 2/4] Increase NUM_LOCK_PARTITIONS to 64

The LWLock table has 16 partitions by default, which may be a bottleneck
on systems with many cores, which are becoming more and more common. This
increases the number of partitions to 64, to reduce the contention.

This may affect cases that need to process the whole table and lock all
the partitions. But there's not too many of those cases, especially in
performance sensitive paths, and the increase from 16 to 64 is not that
significant to really matter.
---
 src/include/storage/lwlock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 167ae342088..2a10efce224 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -95,7 +95,7 @@ extern PGDLLIMPORT int NamedLWLockTrancheRequests;
 #define NUM_BUFFER_PARTITIONS  128
 
 /* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
+#define LOG2_NUM_LOCK_PARTITIONS  6
 #define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
 
 /* Number of partitions the shared predicate lock tables are divided into */
-- 
2.43.0

v240118-0002-Increase-the-number-of-fastpath-locks.patchtext/x-patch; charset=UTF-8; name=v240118-0002-Increase-the-number-of-fastpath-locks.patchDownload
From ef103606034b6bc883414a73b12f3134ebfe460b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Sun, 7 Jan 2024 22:48:22 +0100
Subject: [PATCH v240118 1/4] Increase the number of fastpath locks

The 16 fastpath locks as defined by FP_LOCK_SLOTS_PER_BACKEND may be a
bottleneck with many partitions (or relations in general - e.g. large
joins). This applies especially to many-core systems, but not only.

This increases the numeber of fastpath slots per backend from to 1024
(from 16). This is implemented as hash table of 64 "entries", where an
entry is essentially the current array of 16 fastpath slots. A relation
is mapped to one of the 64 entries by hash(relid), and then following
the existing lookup / eviction logic.

This provides better locality than open-addressing hash tables.

It's not clear if 1024 is the right trade-off. It does make the PGPROC
entry larger, but it's already quite large and no regressions were
observed during benchmarking.
---
 src/backend/storage/lmgr/lock.c | 134 ++++++++++++++++++++++++++------
 src/include/storage/proc.h      |   9 ++-
 2 files changed, 116 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index c70a1adb9ad..3461626eaff 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -169,7 +169,8 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static bool FastPathLocalUseInitialized = false;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -189,20 +190,23 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 7883 + 4481) % FP_LOCK_GROUPS_PER_BACKEND)
+#define FAST_PATH_LOCK_INDEX(n)			((n) % FP_LOCK_SLOTS_PER_GROUP)
+#define FAST_PATH_LOCK_GROUP(n)			((n) / FP_LOCK_SLOTS_PER_GROUP)
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
-	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 AssertMacro((n) < FP_LOCKS_PER_BACKEND), \
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -895,6 +899,12 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		log_lock = true;
 	}
 
+	if (!FastPathLocalUseInitialized)
+	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
+
 	/*
 	 * Attempt to take lock via fast path, if eligible.  But if we remember
 	 * having filled up the fast path array, we don't attempt to make any
@@ -906,7 +916,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1932,6 +1942,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2025,9 +2036,19 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	if (!FastPathLocalUseInitialized)
+	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
+
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2595,12 +2616,27 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
+	uint32		i;
 	uint32		f;
-	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		unused_slot = FP_LOCKS_PER_BACKEND;
+
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	if (!FastPathLocalUseInitialized)
+	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2612,11 +2648,11 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	}
 
 	/* If no existing entry, use any empty slot. */
-	if (unused_slot < FP_LOCK_SLOTS_PER_BACKEND)
+	if (unused_slot < FP_LOCKS_PER_BACKEND)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2632,12 +2668,27 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
+	uint32		i;
 	uint32		f;
 	bool		result = false;
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	if (!FastPathLocalUseInitialized)
 	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
+	{
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2647,7 +2698,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2665,7 +2716,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 {
 	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		i;
+	uint32		i, j, group;
 
 	/*
 	 * Every PGPROC that can potentially hold a fast-path lock is present in
@@ -2701,10 +2752,18 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
 
+			f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+
+			Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
 				continue;
@@ -2735,6 +2794,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			/* No need to examine remaining slots. */
 			break;
 		}
+
 		LWLockRelease(&proc->fpInfoLock);
 	}
 	return true;
@@ -2755,14 +2815,28 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		f, i;
+
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	if (!FastPathLocalUseInitialized)
+	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
 
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
 			continue;
@@ -2866,6 +2940,16 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	int			count = 0;
 	int			fast_count = 0;
 
+	int			group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	if (!FastPathLocalUseInitialized)
+	{
+		FastPathLocalUseInitialized = true;
+		memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));
+	}
+
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
@@ -2902,7 +2986,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	 */
 	if (ConflictsWithRelationFastPath(locktag, lockmode))
 	{
-		int			i;
+		int			i, j;
 		Oid			relid = locktag->locktag_field2;
 		VirtualTransactionId vxid;
 
@@ -2941,10 +3025,14 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
 
+				f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+
+				Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
 					continue;
@@ -3604,7 +3692,7 @@ GetLockStatusData(void)
 
 		LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+		for (f = 0; f < FP_LOCKS_PER_BACKEND; ++f)
 		{
 			LockInstanceData *instance;
 			uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 4bc226e36cd..e5752db1faf 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -82,8 +82,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16		/* don't change */
+#define		FP_LOCKS_PER_BACKEND		(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * An invalid pgprocno.  Must be larger than the maximum number of PGPROC
  * structures we could possibly have.  See comments for MAX_BACKENDS.
@@ -288,8 +289,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND];		/* lock modes held for each fast-path slot */
+	Oid			fpRelId[FP_LOCKS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.43.0

v240118-0003-Add-a-memory-pool-with-adaptive-rebalancing.patchtext/x-patch; charset=UTF-8; name=v240118-0003-Add-a-memory-pool-with-adaptive-rebalancing.patchDownload
From 4a494f739e33fc170f3f4c35e82401389d3ae1ec Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Sun, 7 Jan 2024 20:40:44 +0100
Subject: [PATCH v240118 3/4] Add a memory pool with adaptive rebalancing

A memory pool handles memory allocations of blocks requested by memory
contexts, and serves as a global cache to reduce malloc overhead. The
pool uses similar doubling logic as memory contexts, allocating blocks
with size 1kB, 2kB, 4kB, ..., 8MB. This covers most requests, because
memory contexts use these block sizes.

Oversized chunks are matched to these sizes too - memory contexts handle
them as special case and allocate them separately (and do not cache them)
as it's not clear if a chunk of the same size would be needed. Regular
chunks can't be freed / returned to the OS, so it would waste memory.

But if we treat them as regular blocks, we can still reuse them for some
other block. And memory pool can actually free the memory, unlike memory
contexts.

The memory pool is adaptive - we track the number of allocations needed
for different sizes, and adjust the capacity of the buckets accordingly.
This happens every ~25k allocations, which seems like a good trade-off.

The total amount of memory for the memory pool is not limited - it might
be (to some extent that was the initial motivation of the memory pool),
but by default there's only a "soft" limit of 128MB to restrict the size
of the cached blocks. If a backend needs less than 128MB of memory, the
difference (128MB - allocated) will be available for cached blocks. If
the backend allocates more than 128MB of memory, it won't fail, the but
cached blocks will be evicted/freed.

The 128MB limit is hardcoded, but it might be specified by GUC if neede.
---
 src/backend/access/nbtree/nbtree.c |    3 +
 src/backend/utils/mmgr/aset.c      |   26 +-
 src/backend/utils/mmgr/mcxt.c      | 1071 ++++++++++++++++++++++++++++
 src/include/utils/memutils.h       |    9 +
 4 files changed, 1099 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 696d79c0852..6069d23cbfb 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -361,7 +361,10 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	BTScanPosInvalidate(so->currPos);
 	BTScanPosInvalidate(so->markPos);
 	if (scan->numberOfKeys > 0)
+	{
+		// elog(LOG, "btbeginscan alloc B %ld", scan->numberOfKeys * sizeof(ScanKeyData));
 		so->keyData = (ScanKey) palloc(scan->numberOfKeys * sizeof(ScanKeyData));
+	}
 	else
 		so->keyData = NULL;
 
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 2f99fa9a2f6..ac7bf6dcadb 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -441,7 +441,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
 	 * Allocate the initial block.  Unlike other aset.c blocks, it starts with
 	 * the context header and its block header follows that.
 	 */
-	set = (AllocSet) malloc(firstBlockSize);
+	set = (AllocSet) MemoryPoolAlloc(firstBlockSize);
 	if (set == NULL)
 	{
 		if (TopMemoryContext)
@@ -579,13 +579,15 @@ AllocSetReset(MemoryContext context)
 		}
 		else
 		{
+			Size size = block->endptr - ((char *) block);
+
 			/* Normal case, release the block */
 			context->mem_allocated -= block->endptr - ((char *) block);
 
 #ifdef CLOBBER_FREED_MEMORY
 			wipe_mem(block, block->freeptr - ((char *) block));
 #endif
-			free(block);
+			MemoryPoolFree(block, size);
 		}
 		block = next;
 	}
@@ -649,7 +651,7 @@ AllocSetDelete(MemoryContext context)
 				freelist->num_free--;
 
 				/* All that remains is to free the header/initial block */
-				free(oldset);
+				MemoryPoolFree(oldset, keepersize);
 			}
 			Assert(freelist->num_free == 0);
 		}
@@ -666,6 +668,7 @@ AllocSetDelete(MemoryContext context)
 	while (block != NULL)
 	{
 		AllocBlock	next = block->next;
+		Size size = block->endptr - ((char *) block);
 
 		if (!IsKeeperBlock(set, block))
 			context->mem_allocated -= block->endptr - ((char *) block);
@@ -675,7 +678,9 @@ AllocSetDelete(MemoryContext context)
 #endif
 
 		if (!IsKeeperBlock(set, block))
-			free(block);
+		{
+			MemoryPoolFree(block, size);
+		}
 
 		block = next;
 	}
@@ -683,7 +688,7 @@ AllocSetDelete(MemoryContext context)
 	Assert(context->mem_allocated == keepersize);
 
 	/* Finally, free the context header, including the keeper block */
-	free(set);
+	MemoryPoolFree(set, keepersize);
 }
 
 /*
@@ -725,7 +730,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 #endif
 
 		blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
-		block = (AllocBlock) malloc(blksize);
+		block = (AllocBlock) MemoryPoolAlloc(blksize);
 		if (block == NULL)
 			return NULL;
 
@@ -925,7 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 			blksize <<= 1;
 
 		/* Try to allocate it */
-		block = (AllocBlock) malloc(blksize);
+		block = (AllocBlock) MemoryPoolAlloc(blksize);
 
 		/*
 		 * We could be asking for pretty big blocks here, so cope if malloc
@@ -936,7 +941,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 			blksize >>= 1;
 			if (blksize < required_size)
 				break;
-			block = (AllocBlock) malloc(blksize);
+			block = (AllocBlock) MemoryPoolAlloc(blksize);
 		}
 
 		if (block == NULL)
@@ -1011,6 +1016,7 @@ AllocSetFree(void *pointer)
 	{
 		/* Release single-chunk block. */
 		AllocBlock	block = ExternalChunkGetBlock(chunk);
+		Size size = block->endptr - ((char *) block);
 
 		/*
 		 * Try to verify that we have a sane block pointer: the block header
@@ -1044,7 +1050,7 @@ AllocSetFree(void *pointer)
 #ifdef CLOBBER_FREED_MEMORY
 		wipe_mem(block, block->freeptr - ((char *) block));
 #endif
-		free(block);
+		MemoryPoolFree(block, size);
 	}
 	else
 	{
@@ -1160,7 +1166,7 @@ AllocSetRealloc(void *pointer, Size size)
 		blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
 		oldblksize = block->endptr - ((char *) block);
 
-		block = (AllocBlock) realloc(block, blksize);
+		block = (AllocBlock) MemoryPoolRealloc(block, oldblksize, blksize);
 		if (block == NULL)
 		{
 			/* Disallow access to the chunk header. */
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 1336944084d..8364b46c875 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1640,3 +1640,1074 @@ pchomp(const char *in)
 		n--;
 	return pnstrdup(in, n);
 }
+
+/*
+ * Memory Pools
+ *
+ * Contexts may get memory either directly from the OS (libc) through malloc
+ * calls, but that has non-trivial overhead, depending on the allocation size
+ * and so on. And we tend to allocate fairly large amounts of memory, because
+ * contexts allocate blocks (starting with 1kB, quickly growing by doubling).
+ * A lot of hot paths also allocate pieces of memory exceeding the size limit
+ * and being allocated as a separate block.
+ *
+ * The contexts may cache the memory by keeping chunks, but it's limited to a
+ * single memory context (as AllocSet freelist), and only for the lifetime of
+ * a particular context instance. When the memory is reset/deleted, all the
+ * blocks are freed and retuned to the OS (libc).
+ *
+ * There's a rudimentary cache of memory contexts blocks, but this only keeps
+ * the keeper blocks, not any other blocks that may be needed.
+ *
+ * Memory pools are attempt to improve this by establishing a cache of blocks
+ * shared by all the memory contexts. A memory pool allocates blocks larger
+ * than 1kB, with doubling (1kB, 2kB, 4kB, ...). All the allocations come
+ * from memory contexts, and are either regular blocks (also starting at 1kB)
+ * or oversized chunks (a couple kB or larger). This means the lower limit
+ * is reasonable - there should be no smaller allocations.
+ *
+ * There's no explicit upper size limit - whatever could be used by palloc()
+ * can be requested from the pool. However, only blocks up to 8MB may be
+ * cached by the pool - larger allocations are not kept after pfree().
+ *
+ * To make the reuse possible, the blocks are grouped into size clasess the
+ * same way AllocSet uses for chunks. There are 14 size classes, starting
+ * at 1kB and ending at 8MB.
+ *
+ * This "rouding" applies even to oversized chunks. So e.g. allocating 27kB
+ * will allocate a 32kB block. This wastes memory, but it means the block
+ * may be reused by "regular" allocations. The amount of wasted memory could
+ * be reduced by using size classes with smaller steps, but that reduces the
+ * likelihood of reusing the block.
+ */
+
+
+#define MEMPOOL_MIN_BLOCK	1024L				/* smallest cached block */
+#define MEMPOOL_MAX_BLOCK	(8*1024L*1024L)		/* largest cached block */
+#define MEMPOOL_SIZES		14					/* 1kB -> 8MB */
+
+/*
+ * Maximum amount of memory to keep in cache for all size buckets. Sets a
+ * safety limit limit set on the blocks kept in the *cached* part of the
+ * pool. Each bucket starts with the same amount of memory (1/14 of this)
+ * and then we adapt the cache depending on cache hits/misses.
+ */
+#define MEMPOOL_SIZE_MAX	(128*1024L*1024L)
+
+/*
+ * Maximum number of blocks kept for the whole memory pool. This is used
+ * only to allocate the entries, so we assume all are in the smallest size
+ * bucket.
+ */
+#define MEMPOOL_MAX_BLOCKS	(MEMPOOL_SIZE_MAX / MEMPOOL_MIN_BLOCK)
+
+/*
+ * How often to rebalance the memory pool buckets (number of allocations).
+ * This is a tradeoff between the pool being adaptive and more overhead.
+ */
+#define	MEMPOOL_REBALANCE_DISTANCE		25000
+
+/*
+ * To enable debug logging for the memory pool code, build with -DMEMPOOL_DEBUG.
+ */
+#ifdef MEMPOOL_DEBUG
+
+#undef MEMPOOL_DEBUG
+#define	MEMPOOL_RANDOMIZE(ptr, size)	memset((ptr), 0x7f, (size))
+#define MEMPOOL_DEBUG(...)	fprintf (stderr, __VA_ARGS__)
+
+#else
+
+#define MEMPOOL_DEBUG(...)
+#define MEMPOOL_RANDOMIZE(ptr, size)
+
+#endif	/* MEMPOOL_DEBUG */
+
+
+/*
+ * Entries for a simple linked list of blocks to reuse.
+ */
+typedef struct MemPoolEntry
+{
+	void   *ptr;	/* allocated block (NULL in empty entries) */
+	struct	MemPoolEntry *next;
+} MemPoolEntry;
+
+/*
+ * Information about allocations of blocks of a certain size. We track the number
+ * of currently cached blocks, and also the number of allocated blocks (still
+ * used by the memory context).
+ *
+ * maxcached is the maximum number of free blocks to keep in the cache
+ *
+ * maxallocated is the maximum number of concurrently allocated blocks (from the
+ * point of the memory context)
+ */
+typedef struct MemPoolBucket
+{
+	int				nhits;			/* allocation cache hits */
+	int				nmisses;		/* allocation cache misses */
+	int				nallocated;		/* number of currently allocated blocks */
+	int				maxallocated;	/* max number of allocated blocks */
+	int				ncached;		/* number of free blocks (entry list) */
+	int				maxcached;		/* max number of free blocks to cache */
+	MemPoolEntry   *entry;
+} MemPoolBucket;
+
+/*
+ * MemPool - memory pool, caching allocations between memory contexts
+ *
+ * cache - stores free-d blocks that may be reused for future allocations,
+ * each slot is a list of MemPoolEntry elements using the "entries"
+ *
+ * entries - pre-allocated entries for the freelists, used by cache lists
+ *
+ * freelist - list of free cache entries (not used by the cache lists)
+ *
+ * The meaning of the freelist is somewhat inverse - when a block is freed
+ * by the memory context above, we need to add it to the cache. To do that
+ * we get an entry from the freelist, and add it to the cache. So free-ing
+ * a block removes an entry from the mempool freelist.
+ */
+typedef struct MemPool
+{
+	/* LIFO cache of free-d blocks of eligible sizes (1kB - 1MB, doubled) */
+	MemPoolBucket	cache[MEMPOOL_SIZES];
+
+	/* pre-allocated entries for cache of free-d blocks */
+	MemPoolEntry	entries[MEMPOOL_SIZES * MEMPOOL_MAX_BLOCKS];
+
+	/* head of freelist (entries from the array) */
+	MemPoolEntry   *freelist;
+
+	/* memory limit / accounting */
+	int64 mem_allowed;
+	int64 mem_allocated;
+	int64 mem_cached;
+	int64 num_requests;
+} MemPool;
+
+static MemPool *pool = NULL;
+
+static void
+AssertCheckMemPool(MemPool *p)
+{
+#ifdef ASSERT_CHECKING
+	int	nused = 0;
+	int	nfree = 0;
+	int64	mem_cached = 0;
+	Size	block_size = MEMPOOL_MIN_BLOCK;
+
+	Assert(p->mem_allocated >= 0);
+	Assert(p->mem_cached >= 0);
+
+	/* count the elements in the various cache buckets */
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		int	count = 0;
+
+		Assert(p->cache[i].ncached >= 0);
+		Assert(p->cache[i].ncached <= p->cache[i].maxcached);
+
+		entry = p->cache[i].entry;
+
+		while (entry)
+		{
+			Assert(entry->ptr);
+
+			entry = entry->next;
+			count++;
+		}
+
+		Assert(count == p->cache[i].ncached);
+
+		nused += count;
+		mem_cached += (count * block_size);
+
+		block_size *= 2;
+	}
+
+	/* now count the elements in the freelist */
+	entry = p->freelist;
+	while (entry)
+	{
+		nfree++;
+		entry = entry->next;
+	}
+
+	Assert(nfree + nused == MEMPOOL_SIZES * MEMPOOL_MAX_BLOCKS);
+	Assert(mem_cached == p->mem_cached);
+#endif
+}
+
+static void MemoryPoolRebalanceBuckets(void);
+static void MemoryPoolEnforceSizeLimit(Size request_size, int index);
+
+/*
+ * MemoryPoolInit
+ *		initialize the global memory pool
+ *
+ * Initialize the overall memory pool structure, and also link all entries
+ * into a freelist.
+ */
+static void
+MemoryPoolInit(void)
+{
+	Size	size = MEMPOOL_MIN_BLOCK;
+
+	/* bail out if already initialized */
+	if (pool)
+		return;
+
+	/* allocate the basic structure */
+	pool = malloc(sizeof(MemPool));
+	memset(pool, 0, sizeof(MemPool));
+
+	/* initialize the frelist - put all entries to the list */
+	pool->freelist = &pool->entries[0];
+
+	for (int i = 0; i < (MEMPOOL_SIZES * MEMPOOL_MAX_BLOCKS - 1); i++)
+	{
+		if (i < (MEMPOOL_SIZES * MEMPOOL_MAX_BLOCKS - 1))
+			pool->entries[i].next = &pool->entries[i+1];
+		else
+			pool->entries[i].next = NULL;
+	}
+
+	/* set default maximum counts of entries for each size class */
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		pool->cache[i].maxcached = (MEMPOOL_SIZE_MAX / MEMPOOL_SIZES / size);
+		size *= 2;
+	}
+
+	AssertCheckMemPool(pool);
+}
+
+/*
+ * MemoryPoolEntrySize
+ *		calculate the size of the block to allocate for a given request size
+ *
+ * The request sizes are grouped into pow(2,n) classes, starting at 1kB and
+ * ending at 8MB. Which means there are 14 size classes.
+ */
+static Size
+MemoryPoolEntrySize(Size size)
+{
+	Size	result;
+
+	/*
+	 * We shouldn't really get many malloc() for such small elements through
+	 * memory contexts, so just use the smallest block.
+	 */
+	if (size < MEMPOOL_MIN_BLOCK)
+		return MEMPOOL_MIN_BLOCK;
+
+	/*
+	 * We can get various large allocations - we don't want to cache those,
+	 * not waste space on doubling them, so just allocate them directly.
+	 * Maybe the limit should be separate/lower, like 1MB.
+	 */
+	if (size > MEMPOOL_MAX_BLOCK)
+		return size;
+
+	/*
+	 * Otherwise just calculate the first block larger than the request.
+	 *
+	 * XXX Maybe there's a better way to calculate this? The number of loops
+	 * should be very low, though (less than MEMPOOL_SIZES, i.e. 14).
+	 */
+	result = MEMPOOL_MIN_BLOCK;
+	while (size > result)
+		result *= 2;
+
+	MEMPOOL_DEBUG("%d MempoolEntrySize %lu => %lu\n", getpid(), size, result);
+
+	/* the block size has to be sufficient for the requested size */
+	Assert(size <= result);
+
+	return result;
+}
+
+/*
+ * MemoryPoolEntryIndex
+ *		Calculate the cache index for a given entry size.
+ *
+ * XXX Always called right after MemoryPoolEntrySize, so maybe it should be
+ * merged into a single function, so that the loop happens only once.
+ */
+static int
+MemoryPoolEntryIndex(Size size)
+{
+	int		blockIndex = 0;
+	Size	blockSize = MEMPOOL_MIN_BLOCK;
+
+	/* is size possibly in cache? */
+	if (size < MEMPOOL_MIN_BLOCK || size > MEMPOOL_MAX_BLOCK)
+		return -1;
+
+	/* calculate where to maybe cache the entry */
+	while (blockSize <= MEMPOOL_MAX_BLOCK)
+	{
+		Assert(size >= blockSize);
+
+		if (size == blockSize)
+		{
+			Assert(blockIndex < MEMPOOL_SIZES);
+			return blockIndex;
+		}
+
+		blockIndex++;
+		blockSize *= 2;
+	}
+
+	/* not eligible for caching after all */
+	return -1;
+}
+
+/*
+ * Check that the entry size is valid and matches the class index - if smaller
+ * than 8MB, it needs to be in one of the valid classes.
+ */
+static void
+AssertCheckEntrySize(Size size, int cacheIndex)
+{
+#ifdef USE_ASSERT_CHECKING
+	int	blockSize = MEMPOOL_MIN_BLOCK;
+	int	blockIndex = 0;
+
+	Assert(cacheIndex >= -1 && cacheIndex < MEMPOOL_SIZES);
+
+	/* all sizes in the valid range should be in one of the slots */
+	if (cacheIndex == -1)
+		Assert(size < MEMPOOL_MIN_BLOCK || size > MEMPOOL_MAX_BLOCK);
+	else
+	{
+		/* calculate the block size / index for the given size */
+		while (size > blockSize)
+		{
+			blockSize *= 2;
+			blockIndex++;
+		}
+
+		Assert(size == blockSize);
+		Assert(cacheIndex == blockIndex);
+	}
+#endif
+}
+
+/*
+ * MemoryPoolAlloc
+ *		Allocate a block from the memory pool.
+ *
+ * The block may come either from cache - if available - or from malloc().
+ */
+void *
+MemoryPoolAlloc(Size size)
+{
+	int	index;
+	void *ptr;
+
+	MemoryPoolInit();
+
+	pool->num_requests++;
+
+	MemoryPoolRebalanceBuckets();
+
+	/* maybe override the requested size */
+	size = MemoryPoolEntrySize(size);
+	index = MemoryPoolEntryIndex(size);
+
+	/* cross-check the size and index */
+	AssertCheckEntrySize(size, index);
+
+	/* try to enforce the memory limit */
+	MemoryPoolEnforceSizeLimit(size, index);
+
+	/* Is the block eligible to be in the cache? Or is it too large/small? */
+	if (index >= 0)
+	{
+		MemPoolEntry *entry = pool->cache[index].entry;
+
+		/*
+		 * update the number of allocated chunks, and the high watermark
+		 *
+		 * We do this even if there's no entry in the cache.
+		 */
+		pool->cache[index].nallocated++;
+		pool->cache[index].maxallocated = Max(pool->cache[index].nallocated,
+											  pool->cache[index].maxallocated);
+
+		/*
+		 * If we have a cached block for this size, we're done. Remove it
+		 * from the cache and return the entry to the freelist.
+		 */
+		if (entry != NULL)
+		{
+			/* remember the pointer (we'll reset the entry) */
+			ptr = entry->ptr;
+			entry->ptr = NULL;
+
+			/* remove the entry from the cache */
+			pool->cache[index].entry = entry->next;
+			pool->cache[index].ncached--;
+
+			/* return the entry to the freelist */
+			entry->next = pool->freelist;
+			pool->freelist = entry;
+
+			MEMPOOL_RANDOMIZE(ptr, size);
+			MEMPOOL_DEBUG("%d MemoryPoolAlloc %lu => %d %p HIT\n", getpid(), size, index, ptr);
+
+			/* update memory accounting */
+			Assert(pool->mem_cached >= size);
+
+			pool->mem_cached -= size;
+			pool->mem_allocated += size;
+
+			pool->cache[index].nhits++;
+
+			AssertCheckMemPool(pool);
+
+			return ptr;
+		}
+
+		pool->cache[index].nmisses++;
+	}
+
+	/*
+	 * Either too small/large for the cache, or there's no available block of
+	 * the right size.
+	 */
+	ptr = malloc(size);
+
+	MEMPOOL_RANDOMIZE(ptr, size);
+	MEMPOOL_DEBUG("%d MemoryPoolAlloc %lu => %d %p MISS\n", getpid(), size, index, ptr);
+
+	/* update memory accounting */
+	pool->mem_allocated += size;
+
+	/* maybe we should track the number of over-sized allocations too? */
+	// pool->cache_misses++;
+
+	AssertCheckMemPool(pool);
+
+	return ptr;
+}
+
+/*
+ * MemoryPoolShouldCache
+ *		Should we put the entry into cache at the given index?
+ */
+static bool
+MemoryPoolShouldCache(Size size, int index)
+{
+	MemPoolBucket  *entry = &pool->cache[index];
+
+	/* not in any pool bucket */
+	if (index == -1)
+		return false;
+
+	/*
+	 * Bail out if no freelist entries.
+	 *
+	 * XXX This shouldn't be possible, as we size the freeslist as if all classes
+	 * could have the maximum number of entries (but the actual number grops to
+	 * 1/2 with each size class).
+	 */
+	if (!pool->freelist)
+		return false;
+
+	/* Memory limit is set, and we'd exceed it? Don't cache. */
+	if ((pool->mem_allowed > 0) &&
+		(pool->mem_allocated + pool->mem_cached + size > pool->mem_allowed))
+		return false;
+
+	/* Did we already reach the maximum size of the size class? */
+	return (entry->ncached < entry->maxcached);
+}
+
+/*
+ * MemoryPoolFree
+ *		Free a block, maybe add it to the memory pool cache.
+ */
+void
+MemoryPoolFree(void *pointer, Size size)
+{
+	int	index = 0;
+
+	MemoryPoolInit();
+
+	/*
+	 * Override the requested size (provided by the memory context), calculate
+	 * the appropriate size class index.
+	 */
+	size = MemoryPoolEntrySize(size);
+	index = MemoryPoolEntryIndex(size);
+
+	AssertCheckEntrySize(size, index);
+
+	/* check that we've correctly accounted for this block during allocation */
+	Assert(pool->mem_allocated >= size);
+
+	/*
+	 * update the number of allocated blocks (if eligible for cache)
+	 *
+	 * XXX Needs to happen even if we don't add the block to the cache.
+	 */
+	if (index != -1)
+		pool->cache[index].nallocated--;
+
+	/*
+	 * Should we cache this entry? Do we have entries for the freelist, and
+	 * do we have free space in the size class / memory pool as a whole?
+	 */
+	if (MemoryPoolShouldCache(size, index))
+	{
+		MemPoolEntry *entry;
+
+		entry = pool->freelist;
+		pool->freelist = entry->next;
+
+		/* add the entry to the cache, update number of entries in this bucket */
+		entry->next = pool->cache[index].entry;
+		pool->cache[index].entry = entry;
+		pool->cache[index].ncached++;
+
+		entry->ptr = pointer;
+
+		MEMPOOL_RANDOMIZE(pointer, size);
+		MEMPOOL_DEBUG("%d MemoryPoolFree %lu => %d %p ADD\n", getpid(), size, index, pointer);
+
+		/* update accounting */
+		pool->mem_cached += size;
+		pool->mem_allocated -= size;
+
+		AssertCheckMemPool(pool);
+
+		return;
+	}
+
+	MEMPOOL_RANDOMIZE(pointer, size);
+	MEMPOOL_DEBUG("%d MemoryPoolFree %lu => %d FULL\n", getpid(), size, index);
+
+	/* update accounting */
+	pool->mem_allocated -= size;
+
+	AssertCheckMemPool(pool);
+
+	free(pointer);
+}
+
+/*
+ * MemoryPoolRealloc
+ *		reallocate a previously allocated block
+ *
+ * XXX Maybe this should use the cache too. Right now we just call realloc()
+ * after updating the cache counters. And maybe it should enforce the memory
+ * limit, just like we do in MemoryPoolAlloc().
+ */
+void *
+MemoryPoolRealloc(void *pointer, Size oldsize, Size newsize)
+{
+	void *ptr;
+
+	int		oldindex,
+			newindex;
+
+	MemoryPoolInit();
+
+	oldsize = MemoryPoolEntrySize(oldsize);
+	newsize = MemoryPoolEntrySize(newsize);
+
+	/* XXX Maybe if (oldsize >= newsize) we don't need to do anything? */
+
+	oldindex = MemoryPoolEntryIndex(oldsize);
+	newindex = MemoryPoolEntryIndex(newsize);
+
+	if (oldindex != -1)
+		pool->cache[oldindex].nallocated--;
+
+	if (newindex != -1)
+	{
+		pool->cache[newindex].nallocated++;
+		pool->cache[newindex].maxallocated = Max(pool->cache[newindex].nallocated,
+												 pool->cache[newindex].maxallocated);
+	}
+
+	MEMPOOL_DEBUG("%d MemoryPoolRealloc old %lu => %p\n", getpid(), oldsize, pointer);
+
+	ptr = realloc(pointer, newsize);
+
+	MEMPOOL_DEBUG("%d MemoryPoolRealloc new %lu => %p\n", getpid(), newsize, ptr);
+
+	/* update accounting */
+	Assert(pool->mem_allocated >= oldsize);
+
+	pool->mem_allocated -= oldsize;
+	pool->mem_allocated += newsize;
+
+	AssertCheckMemPool(pool);
+
+	return ptr;
+}
+
+/*
+ * MemoryPoolRebalanceBuckets
+ *		Rebalance the cache capacity for difference size classes.
+ *
+ * The goal of the rebalance is to adapt the cache capacity to changes in the
+ * workload - release blocks of sizes that are no longer needed, allow caching
+ * for new block sizes etc.
+ *
+ * The rebalance happens every MEMPOOL_REBALANCE_DISTANCE allocations - it needs
+ * to happen often enough to adapt to the workload changes, but not too often
+ * to cause significant overhead. The distance also needs to be sufficient to
+ * have a reasonable representation of the allocations.
+ *
+ * The rebalance happens in three phases:
+ *
+ * 1) shrink oversized buckets (maxallocated < maxcached)
+ *
+ * 2) enlarge undersized buckets (maxcached < maxallocated)
+ *
+ * 3) distribute remaining capacity (if any) uniformly
+ *
+ * The reduction in (1) is gradual, i.e. instead of setting maxcached to the
+ * maxallocated value (which may be seen as the minimum capacity needed), we
+ * only go halfway there. The intent is to dampen the transition in case the
+ * current counter is not entirely representative.
+ *
+ * The bucket enlarging in step (2) is proportional to the number of misses
+ * for each bucket (with respect to the total number of misses in the buckets
+ * that are too small). We however don't oversize the bucket - we assign at
+ * most (maxallocated - maxcached) entries, not more in this step.
+ *
+ * Finally, we simply take the remaining unallocated/unassigned memory (up to
+ * MEMPOOL_SIZE_MAX), and distribute it to all the buckets uniformly. That is,
+ * each bucket gets the same amount (rounded to entries of appropriate size).
+ *
+ * XXX Maybe we should have a parameter for the dampening factor in (1), and
+ * not just use 0.5. For example, maybe 0.75 would be better?
+ *
+ * XXX This assumes misses for different buckets are equally expensive, but
+ * that may not be the case. It's likely a miss is proportional to the size
+ * of the block, so maybe we should consider that and use the size as weight
+ * for the cache miss.
+ */
+static void
+MemoryPoolRebalanceBuckets(void)
+{
+	Size	block_size;
+	int64	redistribute_bytes;
+	int64	assigned_bytes = 0;
+	int64	num_total_misses = 0;
+
+	/* only do this once every MEMPOOL_REBALANCE_DISTANCE allocations */
+	if (pool->num_requests < MEMPOOL_REBALANCE_DISTANCE)
+		return;
+
+#ifdef MEMPOOL_DEBUG
+	/* print info about the cache and individual size buckets before the rebalance */
+	MEMPOOL_DEBUG("%d mempool rebalance requests %ld allowed %ld allocated %ld cached %ld\n",
+				  getpid(), pool->num_requests,
+				  pool->mem_allowed, pool->mem_allocated, pool->mem_cached);
+
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		MEMPOOL_DEBUG("%d mempool rebalance bucket %d hit %d miss %d (%.1f%%) maxcached %d cached %d maxallocated %d allocated %d\n",
+					  getpid(), i, pool->cache[i].nhits, pool->cache[i].nmisses,
+					  pool->cache[i].nhits * 100.0 / Max(1, pool->cache[i].nhits + pool->cache[i].nmisses),
+					  pool->cache[i].maxcached, pool->cache[i].ncached,
+					  pool->cache[i].maxallocated, pool->cache[i].nallocated);
+	}
+#endif
+
+	/*
+	 * Are there buckets with cache that is unnecessarily large? That is, with
+	 * (ncached + nallocated > maxallocated). If yes, we release half of that
+	 * and put that into a budget that we can redistribute.
+	 *
+	 * XXX We release half to somewhat dampen the changes over time.
+	 */
+	block_size = MEMPOOL_MIN_BLOCK;
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		/*
+		 * If the cache is large enough to serve all allocations, try making it
+		 * a bit smaller and cut half the extra space (and maybe also free the
+		 * unnecessary blocks).
+		 */
+		if (pool->cache[i].maxcached > pool->cache[i].maxallocated)
+		{
+			int	nentries;
+
+			pool->cache[i].maxcached
+				= (pool->cache[i].maxcached + pool->cache[i].maxallocated) / 2;
+
+			nentries = (pool->cache[i].ncached + pool->cache[i].nallocated);
+			nentries -= pool->cache[i].maxcached;
+
+			/* release enough entries from the cache */
+			while (nentries > 0)
+			{
+				MemPoolEntry *entry = pool->cache[i].entry;
+
+				pool->cache[i].entry = entry->next;
+				pool->cache[i].ncached--;
+
+				free(entry->ptr);
+				entry->ptr = NULL;
+
+				/* add the entry to the freelist */
+				entry->next = pool->freelist;
+				pool->freelist = entry;
+
+				Assert(pool->mem_cached >= block_size);
+
+				/* update accounting */
+				pool->mem_cached -= block_size;
+
+				nentries--;
+			}
+		}
+
+		/* remember how many misses we saw in the undersized buckets */
+		num_total_misses += pool->cache[i].nmisses;
+
+		/* remember how much space we already allocated to this bucket */
+		assigned_bytes += (pool->cache[i].maxcached * block_size);
+
+		/* double the block size */
+		block_size = (block_size << 1);
+	}
+
+	/*
+	 * How much memory we can redistribute? Start with the memory limit,
+	 * and subtract the space currently allocated and assigned to cache.
+	 */
+	redistribute_bytes = Max(pool->mem_allowed, MEMPOOL_SIZE_MAX);
+	redistribute_bytes -= (pool->mem_allocated);
+	redistribute_bytes -= assigned_bytes;
+
+	/*
+	 * Make sure it's not negative (might happen if there's a lot of
+	 * allocated memory).
+	 */
+	redistribute_bytes = Max(0, redistribute_bytes);
+
+	MEMPOOL_DEBUG("%d mempool rebalance can redistribute %ld bytes, allocated %ld bytes, assigned %ld bytes, total misses %ld\n",
+				  getpid(), redistribute_bytes, pool->mem_allocated, assigned_bytes, num_total_misses);
+
+	/*
+	 * Redistribute the memory based on the number of misses, and reset the
+	 * various counters, so that the next round begins afresh.
+	 */
+	if (redistribute_bytes > 0)
+	{
+		block_size = MEMPOOL_MIN_BLOCK;
+		for (int i = 0; i < MEMPOOL_SIZES; i++)
+		{
+			int64	nbytes;
+			int		nentries;
+
+			/* Are we missing entries in cache for this slot? */
+			if (pool->cache[i].maxcached < pool->cache[i].maxallocated)
+			{
+				int nmissing = (pool->cache[i].maxallocated - pool->cache[i].maxcached);
+
+				/*
+				 * How many entries we can add to this size bucket, based on the number
+				 * of cache misses?
+				 */
+				nbytes = redistribute_bytes * pool->cache[i].nmisses / Max(1, num_total_misses);
+				nentries = (nbytes / block_size);
+
+				/* But don't add more than we need. */
+				nentries = Min(nentries, nmissing);
+
+				pool->cache[i].maxcached += nentries;
+				assigned_bytes += nentries * block_size;
+			}
+
+			/* double the block size */
+			block_size = (block_size << 1);
+		}
+	}
+
+	MEMPOOL_DEBUG("%d mempool rebalance done allocated %ld bytes, assigned %ld bytes\n",
+				  getpid(), pool->mem_allocated, assigned_bytes);
+
+	/*
+	 * If we still have some memory, redistribute it uniformly.
+	 */
+	redistribute_bytes = Max(pool->mem_allowed, MEMPOOL_SIZE_MAX);
+	redistribute_bytes -= (pool->mem_allocated);
+	redistribute_bytes -= assigned_bytes;
+
+	/*
+	 * Make sure it's not negative (might happen if there's a lot of
+	 * allocated memory).
+	 */
+	redistribute_bytes = Max(0, redistribute_bytes);
+
+	MEMPOOL_DEBUG("%d mempool rebalance remaining bytes %ld, allocated %ld bytes, assigned %ld bytes\n",
+				  getpid(), redistribute_bytes, pool->mem_allocated, assigned_bytes);
+
+	block_size = MEMPOOL_MIN_BLOCK;
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		int	nentries = (redistribute_bytes / MEMPOOL_SIZES / block_size);
+
+		pool->cache[i].maxcached += nentries;
+
+		/* also reset the various counters */
+		pool->cache[i].maxallocated = pool->cache[i].nallocated;
+		pool->cache[i].nhits = 0;
+		pool->cache[i].nmisses = 0;
+
+		/* double the block size */
+		block_size = (block_size << 1);
+	}
+
+	MEMPOOL_DEBUG("%d mempool rebalance done\n", getpid());
+
+#ifdef MEMPOOL_DEBUG
+	/* print some info about cache hit ratio, but only once in a while */
+	block_size = MEMPOOL_MIN_BLOCK;
+	assigned_bytes = 0;
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		MEMPOOL_DEBUG("%d mempool rebalance bucket %d maxcached %d cached %d maxallocated %d allocated %d\n",
+					  getpid(), i,
+					  pool->cache[i].maxcached, pool->cache[i].ncached,
+					  pool->cache[i].maxallocated, pool->cache[i].nallocated);
+
+		assigned_bytes += (pool->cache[i].maxcached * block_size);
+
+		/* double the block size */
+		block_size = (block_size << 1);
+	}
+	MEMPOOL_DEBUG("%d mempool rebalance allocated %ld assigned %ld (total %ld kB)\n",
+				  getpid(), pool->mem_allocated, assigned_bytes,
+				  (pool->mem_allocated + assigned_bytes) / 1024L);
+#endif
+
+	/* start new rebalance period */
+	pool->num_requests = 0;
+}
+
+/*
+ * MemoryPoolEnforceMaxCounts
+ *		release cached blocks exceeding the maxcached for a given bucket
+ *
+ * XXX This gets called only from MemoryPoolSetSizeLimit, which updates the
+ * maxcount based on the memory limit. Maybe it should be integrated into
+ * that directly?
+ *
+ * XXX Or maybe we should simply do the rebalancing for the new limit?
+ */
+static void
+MemoryPoolEnforceMaxCounts(void)
+{
+	Size	block_size = MEMPOOL_MAX_BLOCK;
+
+	/* nothing cached, so can't release anything */
+	if (pool->mem_cached == 0)
+		return;
+
+	/*
+	 * Walk through the buckets, make sure that no bucket has too many cached
+	 * entries.
+	 */
+	for (int i = MEMPOOL_SIZES - 1; i >= 0; i--)
+	{
+		while (pool->cache[i].entry)
+		{
+			MemPoolEntry *entry = pool->cache[i].entry;
+
+			/* we're within the limit, bail out */
+			if (pool->cache[i].ncached <= pool->cache[i].maxcached)
+				break;
+
+			pool->cache[i].entry = entry->next;
+			pool->cache[i].ncached--;
+
+			free(entry->ptr);
+			entry->ptr = NULL;
+
+			/* add the entry to the freelist */
+			entry->next = pool->freelist;
+			pool->freelist = entry;
+
+			Assert(pool->mem_cached >= block_size);
+
+			/* update accounting */
+			pool->mem_cached -= block_size;
+		}
+
+		/* double the block size */
+		block_size = (block_size << 1);
+	}
+
+	MEMPOOL_DEBUG("%d MemoryPoolEnforceMaxCounts allocated %ld cached %ld\n",
+				  getpid(), pool->mem_allocated, pool->mem_cached);
+
+	AssertCheckMemPool(pool);
+}
+
+/*
+ * MemoryPoolEnforceSizeLimit
+ *		Release cached blocks to allow allocating a block of a given size.
+ *
+ * If actually freeing blocks is needed, we free more of them, so that we don't
+ * need to do that too often. We free at least 2x the amount of space we need,
+ * or 25% of the limit, whichever is larger.
+ *
+ * We free memory from the largest blocks, because that's likely to free memory
+ * the fastest. And we don't alocate those very often.
+ *
+ * XXX Maybe we should free memory in the smaller classes too, so that we don't
+ * end up keeping many unnecessary old blocks, while trashing the large class.
+ */
+static void
+MemoryPoolEnforceSizeLimit(Size request_size, int index)
+{
+	int64	threshold,
+			needtofree;
+
+	Size	block_size = MEMPOOL_MAX_BLOCK;
+
+	/* no memory limit set */
+	if (pool->mem_allowed == 0)
+		return;
+
+	/* nothing cached, so can't release anything */
+	if (pool->mem_cached == 0)
+		return;
+
+	/*
+	 * With the new request, would we exceed the memory limit? we need
+	 * to count both the allocated and cached memory.
+	 *
+	 * XXX In principle the block may be already available in cache, in which
+	 * case we don't need to add it to the allocated + cached figure.
+	 */
+	if (pool->mem_allocated + pool->mem_cached + request_size <= pool->mem_allowed)
+		return;
+
+	/*
+	 * How much we need to release? we don't want to allocate just enough
+	 * for the one request, but a bit more, to prevent trashing.
+	 */
+	threshold = Min(Max(0, pool->mem_allowed - 2 * request_size),
+					pool->mem_allowed * 0.75);
+
+	Assert((threshold >= 0) && (threshold < pool->mem_allowed));
+
+	/*
+	 * How much we need to free, to get under the theshold? Can't free more
+	 * than we have in the cache, though.
+	 *
+	 * XXX One we free at least this amount of memory, we're done.
+	 */
+	needtofree = (pool->mem_allocated + pool->mem_cached + request_size) - threshold;
+	needtofree = Min(needtofree, pool->mem_cached);
+
+	MEMPOOL_DEBUG("%d MemoryPoolMaybeShrink total %ld cached %ld threshold %ld needtofree %ld\n",
+				  getpid(), pool->mem_allocated + pool->mem_cached, pool->mem_cached, threshold, needtofree);
+
+	/* Is it even eligible to be in the cache? */
+	for (int i = MEMPOOL_SIZES - 1; i >= 0; i--)
+	{
+		/* did we free enough memory? */
+		if (needtofree <= 0)
+			break;
+
+		while (pool->cache[i].entry)
+		{
+			MemPoolEntry *entry = pool->cache[i].entry;
+
+			pool->cache[i].entry = entry->next;
+			pool->cache[i].ncached--;
+
+			free(entry->ptr);
+			entry->ptr = NULL;
+
+			/* add the entry to the freelist */
+			entry->next = pool->freelist;
+			pool->freelist = entry;
+
+			needtofree -= block_size;
+
+			/* did we free enough memory? */
+			if (needtofree <= 0)
+				break;
+		}
+
+		block_size = (block_size >> 1);
+	}
+
+	MEMPOOL_DEBUG("%d MemoryPoolEnforceMemoryLimit allocated %ld cached %ld needtofree %ld\n",
+				  getpid(), pool->mem_allocated, pool->mem_cached, needtofree);
+
+	AssertCheckMemPool(pool);
+}
+
+/*
+ * MemoryPoolSetSizeLimit
+ *		Set size limit for the memory pool.
+ */
+void
+MemoryPoolSetSizeLimit(int64 size)
+{
+	Size	blksize = MEMPOOL_MIN_BLOCK;
+	Size	maxsize;
+
+	Assert(pool);
+	Assert(size >= 0);
+
+	pool->mem_allowed = size;
+
+	/* also update the max number of entries for each class size */
+
+	if (size > 0)
+		maxsize = size / MEMPOOL_SIZES;
+	else
+		maxsize = MEMPOOL_SIZE_MAX;
+
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		pool->cache[i].maxcached = (maxsize / blksize);
+		blksize *= 2;
+	}
+
+	/* enforce the updated maxcached limit */
+	MemoryPoolEnforceMaxCounts();
+
+	/* also enforce the general memory limit  */
+	MemoryPoolEnforceSizeLimit(0, -1);
+}
+
+/*
+ * MemoryPoolGetSizeAndCounts
+ */
+void
+MemoryPoolGetSizeAndCounts(int64 *mem_allowed, int64 *mem_allocated, int64 *mem_cached,
+						   int64 *cache_hits, int64 *cache_misses)
+{
+	Assert(pool);
+
+	*mem_allowed = pool->mem_allowed;
+	*mem_allocated = pool->mem_allocated;
+	*mem_cached = pool->mem_cached;
+
+	*cache_hits = 0;
+	*cache_misses = 0;
+
+	for (int i = 0; i < MEMPOOL_SIZES; i++)
+	{
+		*cache_hits += pool->cache[i].nhits;
+		*cache_misses += pool->cache[i].nmisses;
+	}
+}
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index ca7858d6b66..db94e74ccd6 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -179,4 +179,13 @@ extern MemoryContext GenerationContextCreate(MemoryContext parent,
 #define SLAB_DEFAULT_BLOCK_SIZE		(8 * 1024)
 #define SLAB_LARGE_BLOCK_SIZE		(8 * 1024 * 1024)
 
+extern void *MemoryPoolAlloc(Size size);
+extern void *MemoryPoolRealloc(void *pointer, Size oldsize, Size size);
+extern void MemoryPoolFree(void *pointer, Size size);
+
+extern void MemoryPoolSetSizeLimit(int64 size);
+extern void MemoryPoolGetSizeAndCounts(int64 *mem_limit,
+									   int64 *mem_allocated, int64 *mem_cached,
+									   int64 *cache_hits, int64 *cache_misses);
+
 #endif							/* MEMUTILS_H */
-- 
2.43.0

#2Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Tomas Vondra (#1)
Re: scalability bottlenecks with (many) partitions (and more)

Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit :

Hi Tomas !

I'll comment on glibc-malloc part as I studied that part last year, and
proposed some things here: https://www.postgresql.org/message-id/
3424675.QJadu78ljV%40aivenlaptop

FWIW where does the malloc overhead come from? For one, while we do have
some caching of malloc-ed memory in memory contexts, that doesn't quite
work cross-query, because we destroy the contexts at the end of the
query. We attempt to cache the memory contexts too, but in this case
that can't help because the allocations come from btbeginscan() where we
do this:

so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));

and BTScanOpaqueData is ~27kB, which means it's an oversized chunk and
thus always allocated using a separate malloc() call. Maybe we could
break it into smaller/cacheable parts, but I haven't tried, and I doubt

it's the only such allocation.

Did you try running an strace on the process ? That may give you some
hindsights into what malloc is doing. A more sophisticated approach would be
using stap and plugging it into the malloc probes, for example
memory_sbrk_more and memory_sbrk_less.

An important part of glibc's malloc behaviour in that regard comes from the
adjustment of the mmap and free threshold. By default, mmap adjusts them
dynamically and you can poke into that using the
memory_mallopt_free_dyn_thresholds probe.

FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
the behavior a lot - it gets us maybe ~80% of the mempool benefits.
Which is nice, it confirms it's glibc-specific (I wonder if there's a
way to tweak glibc to address this), and it also means systems using
jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
says the mempool has ~20% benefit on top of jemalloc.

GLIBC's malloc offers some tuning for this. In particular, setting either
M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
adjustment" beheviour and allow you to control what it's doing.

By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated using
sbrk isn't freed as easily, and you don't run into a pattern of moving the
sbrk pointer up and down repeatedly. The automatic trade off between the mmap
and trim thresholds is supposed to prevent that, but the way it is incremented
means you can end in a bad place depending on your particular allocation
patttern.

Best regards,

--
Ronan Dunklau

#3Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ronan Dunklau (#2)
Re: scalability bottlenecks with (many) partitions (and more)

On 1/29/24 09:53, Ronan Dunklau wrote:

Le dimanche 28 janvier 2024, 22:57:02 CET Tomas Vondra a écrit :

Hi Tomas !

I'll comment on glibc-malloc part as I studied that part last year, and
proposed some things here: https://www.postgresql.org/message-id/
3424675.QJadu78ljV%40aivenlaptop

Thanks for reminding me. I'll re-read that thread.

FWIW where does the malloc overhead come from? For one, while we do have
some caching of malloc-ed memory in memory contexts, that doesn't quite
work cross-query, because we destroy the contexts at the end of the
query. We attempt to cache the memory contexts too, but in this case
that can't help because the allocations come from btbeginscan() where we
do this:

so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));

and BTScanOpaqueData is ~27kB, which means it's an oversized chunk and
thus always allocated using a separate malloc() call. Maybe we could
break it into smaller/cacheable parts, but I haven't tried, and I doubt

it's the only such allocation.

Did you try running an strace on the process ? That may give you some
hindsights into what malloc is doing. A more sophisticated approach would be
using stap and plugging it into the malloc probes, for example
memory_sbrk_more and memory_sbrk_less.

No, I haven't tried that. In my experience strace is pretty expensive,
and if the issue is in glibc itself (before it does the syscalls),
strace won't really tell us much. Not sure, ofc.

An important part of glibc's malloc behaviour in that regard comes from the
adjustment of the mmap and free threshold. By default, mmap adjusts them
dynamically and you can poke into that using the
memory_mallopt_free_dyn_thresholds probe.

Thanks, I'll take a look at that.

FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
the behavior a lot - it gets us maybe ~80% of the mempool benefits.
Which is nice, it confirms it's glibc-specific (I wonder if there's a
way to tweak glibc to address this), and it also means systems using
jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
says the mempool has ~20% benefit on top of jemalloc.

GLIBC's malloc offers some tuning for this. In particular, setting either
M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
adjustment" beheviour and allow you to control what it's doing.

By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated using
sbrk isn't freed as easily, and you don't run into a pattern of moving the
sbrk pointer up and down repeatedly. The automatic trade off between the mmap
and trim thresholds is supposed to prevent that, but the way it is incremented
means you can end in a bad place depending on your particular allocation
patttern.

So, what values would you recommend for these parameters?

My concern is increasing those value would lead to (much) higher memory
usage, with little control over it. With the mempool we keep more
blocks, ofc, but we have control over freeing the memory.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Tomas Vondra (#3)
Re: scalability bottlenecks with (many) partitions (and more)

Le lundi 29 janvier 2024, 13:17:07 CET Tomas Vondra a écrit :

Did you try running an strace on the process ? That may give you some
hindsights into what malloc is doing. A more sophisticated approach would
be using stap and plugging it into the malloc probes, for example
memory_sbrk_more and memory_sbrk_less.

No, I haven't tried that. In my experience strace is pretty expensive,
and if the issue is in glibc itself (before it does the syscalls),
strace won't really tell us much. Not sure, ofc.

It would tell you how malloc actually performs your allocations, and how often
they end up translated into syscalls. The main issue with glibc would be that
it releases the memory too agressively to the OS, IMO.

An important part of glibc's malloc behaviour in that regard comes from
the
adjustment of the mmap and free threshold. By default, mmap adjusts them
dynamically and you can poke into that using the
memory_mallopt_free_dyn_thresholds probe.

Thanks, I'll take a look at that.

FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
the behavior a lot - it gets us maybe ~80% of the mempool benefits.
Which is nice, it confirms it's glibc-specific (I wonder if there's a
way to tweak glibc to address this), and it also means systems using
jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
says the mempool has ~20% benefit on top of jemalloc.

GLIBC's malloc offers some tuning for this. In particular, setting either
M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
adjustment" beheviour and allow you to control what it's doing.

By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated
using sbrk isn't freed as easily, and you don't run into a pattern of
moving the sbrk pointer up and down repeatedly. The automatic trade off
between the mmap and trim thresholds is supposed to prevent that, but the
way it is incremented means you can end in a bad place depending on your
particular allocation patttern.

So, what values would you recommend for these parameters?

My concern is increasing those value would lead to (much) higher memory
usage, with little control over it. With the mempool we keep more
blocks, ofc, but we have control over freeing the memory.

Right now depending on your workload (especially if you use connection
pooling) you can end up with something like 32 or 64MB of dynamically adjusted
trim-threshold which will never be released back.

The first heurstic I had in mind was to set it to work_mem, up to a
"reasonable" limit I guess. One can argue that it is expected for a backend to
use work_mem frequently, and as such it shouldn't be released back. By setting
work_mem to a lower value, we could ask glibc at the same time to trim the
excess kept memory. That could be useful when a long-lived connection is
pooled, and sees a spike in memory usage only once. Currently that could well
end up with 32MB "wasted" permanently but tuning it ourselves could allow us
to releaase it back.

Since it was last year I worked on this, I'm a bit fuzzy on the details but I
hope this helps.

#5Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ronan Dunklau (#4)
Re: scalability bottlenecks with (many) partitions (and more)

On 1/29/24 15:15, Ronan Dunklau wrote:

Le lundi 29 janvier 2024, 13:17:07 CET Tomas Vondra a écrit :

Did you try running an strace on the process ? That may give you some
hindsights into what malloc is doing. A more sophisticated approach would
be using stap and plugging it into the malloc probes, for example
memory_sbrk_more and memory_sbrk_less.

No, I haven't tried that. In my experience strace is pretty expensive,
and if the issue is in glibc itself (before it does the syscalls),
strace won't really tell us much. Not sure, ofc.

It would tell you how malloc actually performs your allocations, and how often
they end up translated into syscalls. The main issue with glibc would be that
it releases the memory too agressively to the OS, IMO.

An important part of glibc's malloc behaviour in that regard comes from
the
adjustment of the mmap and free threshold. By default, mmap adjusts them
dynamically and you can poke into that using the
memory_mallopt_free_dyn_thresholds probe.

Thanks, I'll take a look at that.

FWIW I was wondering if this is a glibc-specific malloc bottleneck, so I
tried running the benchmarks with LD_PRELOAD=jemalloc, and that improves
the behavior a lot - it gets us maybe ~80% of the mempool benefits.
Which is nice, it confirms it's glibc-specific (I wonder if there's a
way to tweak glibc to address this), and it also means systems using
jemalloc (e.g. FreeBSD, right?) don't have this problem. But it also
says the mempool has ~20% benefit on top of jemalloc.

GLIBC's malloc offers some tuning for this. In particular, setting either
M_MMAP_THRESHOLD or M_TRIM_THRESHOLD will disable the unpredictable "auto
adjustment" beheviour and allow you to control what it's doing.

By setting a bigger M_TRIM_THRESHOLD, one can make sure memory allocated
using sbrk isn't freed as easily, and you don't run into a pattern of
moving the sbrk pointer up and down repeatedly. The automatic trade off
between the mmap and trim thresholds is supposed to prevent that, but the
way it is incremented means you can end in a bad place depending on your
particular allocation patttern.

So, what values would you recommend for these parameters?

My concern is increasing those value would lead to (much) higher memory
usage, with little control over it. With the mempool we keep more
blocks, ofc, but we have control over freeing the memory.

Right now depending on your workload (especially if you use connection
pooling) you can end up with something like 32 or 64MB of dynamically adjusted
trim-threshold which will never be released back.

OK, so let's say I expect each backend to use ~90MB of memory (allocated
at once through memory contexts). How would you set the two limits? By
default it's set to 128kB, which means blocks larger than 128kB are
mmap-ed and released immediately.

But there's very few such allocations - a vast majority of blocks in the
benchmark workloads is <= 8kB or ~27kB (those from btbeginscan).

So I'm thinking about leaving M_TRIM_THRESHOLD as is, but increasing the
M_TRIM_THRESHOLD value to a couple MBs. But I doubt that'll really help,
because what I expect to happen is we execute a query and it allocates
all memory up to a high watermark of ~90MB. And then the query
completes, and we release almost all of it. And even with trim threshold
set to e.g. 8MB we'll free almost all of it, no?

What we want to do is say - hey, we needed 90MB, and now we need 8MB. We
could free 82MB, but maybe let's wait a bit and see if we need that
memory again. And that's pretty much what the mempool does, but I don't
see how to do that using the mmap options.

The first heurstic I had in mind was to set it to work_mem, up to a
"reasonable" limit I guess. One can argue that it is expected for a backend to
use work_mem frequently, and as such it shouldn't be released back. By setting
work_mem to a lower value, we could ask glibc at the same time to trim the
excess kept memory. That could be useful when a long-lived connection is
pooled, and sees a spike in memory usage only once. Currently that could well
end up with 32MB "wasted" permanently but tuning it ourselves could allow us
to releaase it back.

I'm not sure work_mem is a good parameter to drive this. It doesn't say
how much memory we expect the backend to use - it's a per-operation
limit, so it doesn't work particularly well with partitioning (e.g. with
100 partitions, we may get 100 nodes, which is completely unrelated to
what work_mem says). A backend running the join query with 1000
partitions uses ~90MB (judging by data reported by the mempool), even
with work_mem=4MB. So setting the trim limit to 4MB is pretty useless.

The mempool could tell us how much memory we need (but we could track
this in some other way too, probably). And we could even adjust the mmap
parameters regularly, based on current workload.

But there's then there's the problem that the mmap parameters don't tell
us how much memory to keep, but how large chunks to release.

Let's say we want to keep the 90MB (to allocate the memory once and then
reuse it). How would you do that? We could set MMAP_TRIM_TRESHOLD 100MB,
but then it takes just a little bit of extra memory to release all the
memory, or something.

Since it was last year I worked on this, I'm a bit fuzzy on the details but I
hope this helps.

Thanks for the feedback / insights!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Ronan Dunklau
ronan.dunklau@aiven.io
In reply to: Tomas Vondra (#5)
Re: scalability bottlenecks with (many) partitions (and more)

Le lundi 29 janvier 2024, 15:59:04 CET Tomas Vondra a écrit :

I'm not sure work_mem is a good parameter to drive this. It doesn't say
how much memory we expect the backend to use - it's a per-operation
limit, so it doesn't work particularly well with partitioning (e.g. with
100 partitions, we may get 100 nodes, which is completely unrelated to
what work_mem says). A backend running the join query with 1000
partitions uses ~90MB (judging by data reported by the mempool), even
with work_mem=4MB. So setting the trim limit to 4MB is pretty useless.

I understand your point, I was basing my previous observations on what a
backend typically does during the execution.

The mempool could tell us how much memory we need (but we could track
this in some other way too, probably). And we could even adjust the mmap
parameters regularly, based on current workload.

But there's then there's the problem that the mmap parameters don't tell
If we > > us how much memory to keep, but how large chunks to release.

Let's say we want to keep the 90MB (to allocate the memory once and then
reuse it). How would you do that? We could set MMAP_TRIM_TRESHOLD 100MB,
but then it takes just a little bit of extra memory to release all the
memory, or something.

For doing this you can set M_TOP_PAD using glibc malloc. Which makes sure a
certain amount of memory is always kept.

But the way the dynamic adjustment works makes it sort-of work like this.
MMAP_THRESHOLD and TRIM_THRESHOLD start with low values, meaning we don't
expect to keep much memory around.

So even "small" memory allocations will be served using mmap at first. Once
mmaped memory is released, glibc's consider it a benchmark for "normal"
allocations that can be routinely freed, and adjusts mmap_threshold to the
released mmaped region size, and trim threshold to two times that.

It means over time the two values will converge either to the max value (32MB
for MMAP_THRESHOLD, 64 for trim threshold) or to something big enough to
accomodate your released memory, since anything bigger than half trim
threshold will be allocated using mmap.

Setting any parameter disable that.

But I'm not arguing against the mempool, just chiming in with glibc's malloc
tuning possibilities :-)

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ronan Dunklau (#6)
1 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On 1/29/24 16:42, Ronan Dunklau wrote:

Le lundi 29 janvier 2024, 15:59:04 CET Tomas Vondra a écrit :

I'm not sure work_mem is a good parameter to drive this. It doesn't say
how much memory we expect the backend to use - it's a per-operation
limit, so it doesn't work particularly well with partitioning (e.g. with
100 partitions, we may get 100 nodes, which is completely unrelated to
what work_mem says). A backend running the join query with 1000
partitions uses ~90MB (judging by data reported by the mempool), even
with work_mem=4MB. So setting the trim limit to 4MB is pretty useless.

I understand your point, I was basing my previous observations on what a
backend typically does during the execution.

The mempool could tell us how much memory we need (but we could track
this in some other way too, probably). And we could even adjust the mmap
parameters regularly, based on current workload.

But there's then there's the problem that the mmap parameters don't tell
If we > > us how much memory to keep, but how large chunks to release.

Let's say we want to keep the 90MB (to allocate the memory once and then
reuse it). How would you do that? We could set MMAP_TRIM_TRESHOLD 100MB,
but then it takes just a little bit of extra memory to release all the
memory, or something.

For doing this you can set M_TOP_PAD using glibc malloc. Which makes sure a
certain amount of memory is always kept.

But the way the dynamic adjustment works makes it sort-of work like this.
MMAP_THRESHOLD and TRIM_THRESHOLD start with low values, meaning we don't
expect to keep much memory around.

So even "small" memory allocations will be served using mmap at first. Once
mmaped memory is released, glibc's consider it a benchmark for "normal"
allocations that can be routinely freed, and adjusts mmap_threshold to the
released mmaped region size, and trim threshold to two times that.

It means over time the two values will converge either to the max value (32MB
for MMAP_THRESHOLD, 64 for trim threshold) or to something big enough to
accomodate your released memory, since anything bigger than half trim
threshold will be allocated using mmap.

Setting any parameter disable that.

Thanks. I gave this a try, and I started the tests with this setting:

export MALLOC_TOP_PAD_=$((64*1024*1024))
export MALLOC_MMAP_THRESHOLD_=$((1024*1024))
export MALLOC_TRIM_THRESHOLD_=$((1024*1024))

which I believe means that:

1) we'll keep 64MB "extra" memory on top of heap, serving as a cache for
future allocations

2) everything below 1MB (so most of the blocks we allocate for contexts)
will be allocated on heap (hence from the cache)

3) we won't trim heap unless there's at least 1MB of free contiguous
space (I wonder if this should be the same as MALLOC_TOP_PAD)

Those are mostly arbitrary values / guesses, and I don't have complete
results yet. But from the results I have it seems this has almost the
same effect as the mempool thing - see the attached PDF, with results
for the "partitioned join" benchmark.

first column - "master" (17dev) with no patches, default glibc

second column - 17dev + locking + mempool, default glibc

third column - 17dev + locking, tuned glibc

The color scale on the right is throughput comparison (third/second), as
a percentage with e.g. 90% meaning tuned glibc is 10% slower than the
mempool results. Most of the time it's slower but very close to 100%,
sometimes it's a bit faster. So overall it's roughly the same.

The color scales below the results is a comparison of each branch to the
master (without patches), showing comparison to current performance.
It's almost the same, although the tuned glibc has a couple regressions
that the mempool does not have.

But I'm not arguing against the mempool, just chiming in with glibc's malloc
tuning possibilities :-)

Yeah. I think the main problem with the glibc parameters is that it's
very implementation-specific and also static - the mempool is more
adaptive, I think. But it's an interesting experiment.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

glibc-malloc-tuning.pdfapplication/pdf; name=glibc-malloc-tuning.pdfDownload
#8Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#1)
Re: scalability bottlenecks with (many) partitions (and more)

On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The
LWLock table has 16 partitions by default - it's quite possible that on
machine with many cores and/or many partitions, we can easily hit this.
So I bumped this 4x to 64 partitions.

I think this probably makes sense. I'm a little worried that we're
just kicking the can down the road here where maybe we should be
solving the problem in some more fundamental way, and I'm also a
little worried that we might be reducing single-core performance. But
it's probably fine.

What I ended up doing is having a hash table of 16-element arrays. There
are 64 "pieces", each essentially the (16 x OID + UINT64 bitmap) that we
have now. Each OID is mapped to exactly one of these parts as if in a
hash table, and in each of those 16-element parts we do exactly the same
thing we do now (linear search, removal, etc.). This works great, the
locality is great, etc. The one disadvantage is this makes PGPROC
larger, but I did a lot of benchmarks and I haven't seen any regression
that I could attribute to this. (More about this later.)

I think this is a reasonable approach. Some comments:

- FastPathLocalUseInitialized seems unnecessary to me; the contents of
an uninitialized local variable are undefined, but an uninitialized
global variable always starts out zeroed.

- You need comments in various places, including here, where someone
is certain to have questions about the algorithm and choice of
constants:

+#define FAST_PATH_LOCK_REL_GROUP(rel) (((uint64) (rel) * 7883 + 4481)
% FP_LOCK_GROUPS_PER_BACKEND)

When I originally coded up the fast-path locking stuff, I supposed
that we couldn't make the number of slots too big because the
algorithm requires a linear search of the whole array. But with this
one trick (a partially-associative cache), there's no real reason that
I can think of why you can't make the number of slots almost
arbitrarily large. At some point you're going to use too much memory,
and probably before that point you're going to make the cache big
enough that it doesn't fit in the CPU cache of an individual core, at
which point possibly it will stop working as well. But honestly ... I
don't quite see why this approach couldn't be scaled quite far.

Like, if we raised FP_LOCK_GROUPS_PER_BACKEND from your proposed value
of 64 to say 65536, would that still perform well? I'm not saying we
should do that, because that's probably a silly amount of memory to
use for this, but the point is that when you start to have enough
partitions that you run out of lock slots, performance is going to
degrade, so you can imagine wanting to try to have enough lock groups
to make that unlikely. Which leads me to wonder if there's any
particular number of lock groups that is clearly "too many" or whether
it's just about how much memory we want to use.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Robert Haas (#8)
Re: scalability bottlenecks with (many) partitions (and more)

On 6/24/24 17:05, Robert Haas wrote:

On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The
LWLock table has 16 partitions by default - it's quite possible that on
machine with many cores and/or many partitions, we can easily hit this.
So I bumped this 4x to 64 partitions.

I think this probably makes sense. I'm a little worried that we're
just kicking the can down the road here where maybe we should be
solving the problem in some more fundamental way, and I'm also a
little worried that we might be reducing single-core performance. But
it's probably fine.

Yeah, I haven't seen this causing any regressions - the sensitive paths
typically lock only one partition, so having more of them does not
affect that. Or if it does, it's likely a reasonable trade off as it
reduces the risk of lock contention.

That being said, I don't recall benchmarking this patch in isolation,
only with the other patches. Maybe I should do that ...

What I ended up doing is having a hash table of 16-element arrays. There
are 64 "pieces", each essentially the (16 x OID + UINT64 bitmap) that we
have now. Each OID is mapped to exactly one of these parts as if in a
hash table, and in each of those 16-element parts we do exactly the same
thing we do now (linear search, removal, etc.). This works great, the
locality is great, etc. The one disadvantage is this makes PGPROC
larger, but I did a lot of benchmarks and I haven't seen any regression
that I could attribute to this. (More about this later.)

I think this is a reasonable approach. Some comments:

- FastPathLocalUseInitialized seems unnecessary to me; the contents of
an uninitialized local variable are undefined, but an uninitialized
global variable always starts out zeroed.

OK. I didn't realize global variables start a zero.

- You need comments in various places, including here, where someone
is certain to have questions about the algorithm and choice of
constants:

+#define FAST_PATH_LOCK_REL_GROUP(rel) (((uint64) (rel) * 7883 + 4481)
% FP_LOCK_GROUPS_PER_BACKEND)

Yeah, definitely needs comment explaining this.

I admit those numbers are pretty arbitrary primes, to implement a
trivial hash function. That was good enough for a PoC patch, but maybe
for a "proper" version this should use a better hash function. It needs
to be fast, and maybe it doesn't matter that much if it's not perfect.

When I originally coded up the fast-path locking stuff, I supposed
that we couldn't make the number of slots too big because the
algorithm requires a linear search of the whole array. But with this
one trick (a partially-associative cache), there's no real reason that
I can think of why you can't make the number of slots almost
arbitrarily large. At some point you're going to use too much memory,
and probably before that point you're going to make the cache big
enough that it doesn't fit in the CPU cache of an individual core, at
which point possibly it will stop working as well. But honestly ... I
don't quite see why this approach couldn't be scaled quite far.

I don't think I've heard the term "partially-associative cache" before,
but now that I look at the approach again, it very much reminds me how
set-associative caches work (e.g. with cachelines in CPU caches). It's a
16-way associative cache, assigning each entry into one of 16 slots.

I must have been reading some papers in this area shortly before the PoC
patch, and the idea came from there, probably. Which is good, because it
means it's a well-understood and widely-used approach.

Like, if we raised FP_LOCK_GROUPS_PER_BACKEND from your proposed value
of 64 to say 65536, would that still perform well? I'm not saying we
should do that, because that's probably a silly amount of memory to
use for this, but the point is that when you start to have enough
partitions that you run out of lock slots, performance is going to
degrade, so you can imagine wanting to try to have enough lock groups
to make that unlikely. Which leads me to wonder if there's any
particular number of lock groups that is clearly "too many" or whether
it's just about how much memory we want to use.

That's an excellent question. I don't know.

I agree 64 groups is pretty arbitrary, and having 1024 may not be enough
even with a modest number of partitions. When I was thinking about using
a higher value, my main concern was that it'd made the PGPROC entry
larger. Each "fast-path" group is ~72B, so 64 groups is ~4.5kB, and that
felt like quite a bit.

But maybe it's fine and we could make it much larger - L3 caches tend to
be many MBs these days, although AFAIK it's shared by threads running on
the CPU.

I'll see if I can do some more testing of this, and see if there's a
value where it stops improving / starts degrading, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#9)
Re: scalability bottlenecks with (many) partitions (and more)

On Tue, Jun 25, 2024 at 6:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Yeah, definitely needs comment explaining this.

I admit those numbers are pretty arbitrary primes, to implement a
trivial hash function. That was good enough for a PoC patch, but maybe
for a "proper" version this should use a better hash function. It needs
to be fast, and maybe it doesn't matter that much if it's not perfect.

Right. My guess is that if we try too hard to make the hash function
good, there will be a performance hit. Unlike, say, strings that come
from the user, we have no reason to believe that relfilenumbers will
have any particular structure or pattern to them, so a low-quality,
fast function seems like a good trade-off to me. But I'm *far* from a
hashing expert, so I'm prepared for someone who is to tell me that I'm
full of garbage.

I don't think I've heard the term "partially-associative cache" before
That's an excellent question. I don't know.

I agree 64 groups is pretty arbitrary, and having 1024 may not be enough
even with a modest number of partitions. When I was thinking about using
a higher value, my main concern was that it'd made the PGPROC entry
larger. Each "fast-path" group is ~72B, so 64 groups is ~4.5kB, and that
felt like quite a bit.

But maybe it's fine and we could make it much larger - L3 caches tend to
be many MBs these days, although AFAIK it's shared by threads running on
the CPU.

I'll see if I can do some more testing of this, and see if there's a
value where it stops improving / starts degrading, etc.

Sounds good.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#9)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 6/25/24 12:04, Tomas Vondra wrote:

On 6/24/24 17:05, Robert Haas wrote:

On Sun, Jan 28, 2024 at 4:57 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

For NUM_LOCK_PARTITIONS this is pretty simple (see 0001 patch). The
LWLock table has 16 partitions by default - it's quite possible that on
machine with many cores and/or many partitions, we can easily hit this.
So I bumped this 4x to 64 partitions.

I think this probably makes sense. I'm a little worried that we're
just kicking the can down the road here where maybe we should be
solving the problem in some more fundamental way, and I'm also a
little worried that we might be reducing single-core performance. But
it's probably fine.

Yeah, I haven't seen this causing any regressions - the sensitive paths
typically lock only one partition, so having more of them does not
affect that. Or if it does, it's likely a reasonable trade off as it
reduces the risk of lock contention.

That being said, I don't recall benchmarking this patch in isolation,
only with the other patches. Maybe I should do that ...

What I ended up doing is having a hash table of 16-element arrays. There
are 64 "pieces", each essentially the (16 x OID + UINT64 bitmap) that we
have now. Each OID is mapped to exactly one of these parts as if in a
hash table, and in each of those 16-element parts we do exactly the same
thing we do now (linear search, removal, etc.). This works great, the
locality is great, etc. The one disadvantage is this makes PGPROC
larger, but I did a lot of benchmarks and I haven't seen any regression
that I could attribute to this. (More about this later.)

I think this is a reasonable approach. Some comments:

- FastPathLocalUseInitialized seems unnecessary to me; the contents of
an uninitialized local variable are undefined, but an uninitialized
global variable always starts out zeroed.

OK. I didn't realize global variables start a zero.

I haven't fixed this yet, but it's pretty clear the "init" is not really
needed, because it did the memset() wrong:

memset(FastPathLocalUseCounts, 0, sizeof(FastPathLocalUseInitialized));

This only resets one byte of the array, yet it still worked correctly.

- You need comments in various places, including here, where someone
is certain to have questions about the algorithm and choice of
constants:

+#define FAST_PATH_LOCK_REL_GROUP(rel) (((uint64) (rel) * 7883 + 4481)
% FP_LOCK_GROUPS_PER_BACKEND)

Yeah, definitely needs comment explaining this.

I admit those numbers are pretty arbitrary primes, to implement a
trivial hash function. That was good enough for a PoC patch, but maybe
for a "proper" version this should use a better hash function. It needs
to be fast, and maybe it doesn't matter that much if it's not perfect.

When I originally coded up the fast-path locking stuff, I supposed
that we couldn't make the number of slots too big because the
algorithm requires a linear search of the whole array. But with this
one trick (a partially-associative cache), there's no real reason that
I can think of why you can't make the number of slots almost
arbitrarily large. At some point you're going to use too much memory,
and probably before that point you're going to make the cache big
enough that it doesn't fit in the CPU cache of an individual core, at
which point possibly it will stop working as well. But honestly ... I
don't quite see why this approach couldn't be scaled quite far.

I don't think I've heard the term "partially-associative cache" before,
but now that I look at the approach again, it very much reminds me how
set-associative caches work (e.g. with cachelines in CPU caches). It's a
16-way associative cache, assigning each entry into one of 16 slots.

I must have been reading some papers in this area shortly before the PoC
patch, and the idea came from there, probably. Which is good, because it
means it's a well-understood and widely-used approach.

Like, if we raised FP_LOCK_GROUPS_PER_BACKEND from your proposed value
of 64 to say 65536, would that still perform well? I'm not saying we
should do that, because that's probably a silly amount of memory to
use for this, but the point is that when you start to have enough
partitions that you run out of lock slots, performance is going to
degrade, so you can imagine wanting to try to have enough lock groups
to make that unlikely. Which leads me to wonder if there's any
particular number of lock groups that is clearly "too many" or whether
it's just about how much memory we want to use.

That's an excellent question. I don't know.

I agree 64 groups is pretty arbitrary, and having 1024 may not be enough
even with a modest number of partitions. When I was thinking about using
a higher value, my main concern was that it'd made the PGPROC entry
larger. Each "fast-path" group is ~72B, so 64 groups is ~4.5kB, and that
felt like quite a bit.

But maybe it's fine and we could make it much larger - L3 caches tend to
be many MBs these days, although AFAIK it's shared by threads running on
the CPU.

I'll see if I can do some more testing of this, and see if there's a
value where it stops improving / starts degrading, etc.

I finally got to do those experiments. The scripts and results (both raw
and summarized) are too big to attach everything here, available at

https://github.com/tvondra/scalability-tests

The initial patch used 64 (which means 1024 fast-path slots), I ran the
tests with 0, 1, 8, 32, 128, 512, 1024 (so up to 16k locks). I thought
about testing with ~64k groups, but I didn't go with the extreme value
because I don't quite see the point.

It would only matter for cases with a truly extreme number of partitions
(64k groups is ~1M fast-path slots), and just creating enough partitions
would take a lot of time. Moreover, with that many partitions we seems
to have various other bottlenecks, and improving this does not make it
really practical. And it's so slow the benchmark results are somewhat
bogus too.

Because if we achieve 50 tps with 1000 partitions, does it really matter
a patch changes that to 25 of 100 tps? I doubt that, especially if going
to 100 partitions gives you 2000 tps. Now imagine you have 10k or 100k
partitions - how fast is that going to be?

So I think stopping at 1024 groups is sensible, and if there are some
inefficiencies / costs, I'd expect those to gradually show up even at
those lower sizes.

But if you look at results, for example from the "join" test:

https://github.com/tvondra/scalability-tests/blob/main/join.pdf

there's no such negative effect. the table shows results for different
combinations of parameters, with the first group of columns being on
regular glibc, the second one has glibc tuning (see [1]/messages/by-id/0da51f67-c88b-497e-bb89-d5139309eb9c@enterprisedb.com for details).
And the values are for different number of fast-path groups (0 means the
patch was not applied).

And the color scale on the show the impact of increasing the number of
groups. So for example when a column for "32 groups" says 150%, it means
going from 8 to 32 groups improved throughput to 1.5x. As usual, green
is "good" and red is "bad".

But if you look at the tables, there's very little change - most of the
values are close to 100%. This might seem a bit strange, considering the
promise of these patches is to improve throughput, and "no change" is an
absence of that. But that's because the charts illustrate effect of
changing the group count with other parameters fixed. It never compares
runs with/without glibc runing, and that's an important part of the
improvement. Doing the pivot table a bit differently would still show a
substantial 2-3x improvement.

There's a fair amount of noise - especially for the rpi5 machines (not
the right hw for sensitive benchmarks), but also on some i5/xeon runs. I
attribute this to only doing one short run (10s) for each combinations
of parameters. I'll do more runs next time.

Anyway, I think these results show a couple things:

1) There's no systemic negative effect of increasing the number of
groups. We could go with 32k or 64k groups, and it doesn't seem like
there would be a problem.

2) But there's not much point in doing that, because we run into various
other bottlenecks well before having that many locks. By the results, it
doesn't seem going beyond 32 or 64 groups would give us much.

3) The memory allocation caching (be it the mempool patch, or the glibc
tuning like in this test round) is a crucial piece for this. Not doing
that means some tests get no improvement at all, or a much smaller one.

4) The increase of NUM_LOCK_PARTITIONS has very limited effect, or
perhaps even no effect at all.

Based on this, my plan is to polish the patch adding fast-path groups,
with either 32 or 64 groups, which seems to be reasonable values. Then
in the future, if/when the other bottlenecks get addressed, we can
rethink and increase this.

This however reminds me that all those machines are pretty small. Which
is good for showing it doesn't break existing/smaller systems, but the
initial goal of the patch was to improve behavior on big boxes. I don't
have access to the original box at the moment, so if someone could
provide an access to one of those big epyc/xeon machines with 100+ cores
for a couple days, that would be helpful.

That being said, I think it's pretty clear how serious the issue with
memory allocation overhead can be, especially in cases when the existing
memory context caching is ineffective (like for the btree palloc). I'm
not sure what to do about that. The mempool patch shared in this thread
does the trick, it's fairly complex/invasive. I still like it, but maybe
doing something with the glibc tuning would be enough - it's not as
effective, but 80% of the improvement is better than no improvement.

regards

[1]: /messages/by-id/0da51f67-c88b-497e-bb89-d5139309eb9c@enterprisedb.com
/messages/by-id/0da51f67-c88b-497e-bb89-d5139309eb9c@enterprisedb.com

--
Tomas Vondra

#12Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#11)
2 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

While discussing this patch with Robert off-list, one of the questions
he asked was is there's some size threshold after which it starts to
have negative impact. I didn't have a good answer to that - I did have
some intuition (that making it too large would not hurt), but I haven't
done any tests with "extreme" sizes of the fast-path structs.

So I ran some more tests, with up to 4096 "groups" (which means 64k
fast-path slots). And no matter how I slice the results, there's no
clear regression points, beyond which the performance would start to
decline (even just slowly). It's the same for all benchmarks, client
counts, query mode, and so on.

I'm attaching two PDFs with results for the "join" benchmark I described
earlier (query with a join on many partitions) from EPYC 7763 (64/128c).
The first one is with "raw" data (throughput = tps), the second one is
relative throughput to the first column (which is pretty much current
master, with no optimizations applied).

The complete results including some nice .odp spreadsheets and scripts
are available here:

https://github.com/tvondra/pg-lock-scalability-results

There's often a very clear point where the performance significantly
improves - this is usually when all the relation locks start to fit into
the fast-path array. With 1000 relations that's ~64 groups, and so on.
But there's no point where it would start declining.

My explanation is that the PGPROC (where the fast-path array is) is so
large already (close to 1kB), that making it large does not really cause
any additional cache misses, etc. And if it does, it's far out-weighted
by cost of accessing (or not having to) the shared lock table.

So I don't think there's any point at which point we'd start to regress,
at least not because of cache misses, CPU etc. It stops improving, but
that's just a sign that we've hit some other bottleneck - that's not a
fault of this patch.

But that's not the whole story, of course. Because if there were no
issues, why not to just make the fast-path array insanely large? In
another off-list discussion Andres asked me about the memory this would
need, and after looking at the numbers I think that's a strong argument
to keep the numbers reasonable.

I did a quick experiment to see the per-connection memory requirements,
and how would it be affected by this patch. I simply logged the amount
of shared memory CalculateShmemSize(), started the server with 100 and
1000 connections, and did a bit of math to calculate how much memory we
need "per connection".

For master and different numbers of fast-path groups I got this:

master 64 1024 32765
---------------------------------
47668 52201 121324 2406892

So by default we need ~48kB / connection, with 64 groups we need ~52kB
(which makes sense because that's 1024 x 4B slots), and then with 1024
slots we get to 120kB etc and with 32k ~2.5MB.

I guess those higher values seem a bit insane - we don't want to just
increase the per-connection memory requirements 50x for everyone, right?

But what about the people who actually want this many locks? Let's bump
the max_locks_per_transactions from 64 to 1024, and we get this:

master 64 1024 32765
-------------------------------------
419367 423909 493022 2778590

Suddenly, the differences are much smaller, especially for the 64
groups, which is roughly the same number of fast-path slots as the max
locks per transactions. That shrunk to ~1% difference. But wen for 1024
groups it's now just ~20%, which I think it well worth the benefits.

And likely something the system should have available - with 1000
connections that's ~80MB. And if you run with 1000 connections, 80MB
should be rounding error, IMO.

Of course, it does not seem great to force everyone to pay this price,
even if they don't need that many locks (and so there's no benefit). So
how would we improve that?

I don't think that's possible with hard-coded size of the array - that
allocates the memory for everyone. We'd need to make it variable-length,
and while doing those benchmarks I think we actually already have a GUC
for that - max_locks_per_transaction tells us exactly what we need to
know, right? I mean, if I know I'll need ~1000 locks, why not to make
the fast-path array large enough for that?

Of course, the consequence of this would be making PGPROC variable
length, or having to point to a memory allocated separately (I prefer
the latter option, I think). I haven't done any experiments, but it
seems fairly doable - of course, not sure if it might be more expensive
compared to compile-time constants.

At this point I think it's fairly clear we have significant bottlenecks
when having to lock many relations - and that won't go away, thanks to
partitioning etc. We're already fixing various other bottlenecks for
these workloads, which will just increase pressure on locking.

Fundamentally, I think we'll need to either evolve the fast-path system
to handle more relations (the limit of 16 was always rather quite low),
or invent some entirely new thing that does something radical (say,
locking a "group" of relations instead of locking them one by one).

This patch is doing the first thing, and IMHO the increased memory
consumption is a sensible / acceptable trade off. I'm not sure of any
proposal for the second approach, and I don't have any concrete idea how
it might work.

regards

--
Tomas Vondra

Attachments:

join-epyc-relative.pdfapplication/pdf; name=join-epyc-relative.pdfDownload
join-epyc-data.pdfapplication/pdf; name=join-epyc-data.pdfDownload
%PDF-1.7
%��������
2 0 obj
<</Length 3 0 R/Filter/FlateDecode>>
stream
x���]�&I��w��"���/���tW
-,h�{!t��`�,������f�5�??�z�**�~H��w��v�,��������r���v%�Q������������?��o��/����_�������o����������������[?�����Uj������.8U�������?��_~�����_~���I_Y�~���2����~^�������o��G�V���������\���5��������_���������������������������+G:F��{u���������O������~;�o������R�_������U�z0����~�F�5���|������������g�{����/�rKW������?�v~%��{������~h�N��?I��s�Y��������lk���gY���|U�=zj�w��K(��t�������;��w����������G?��d��?������(�:[XK���X�h�c�T{����y=�����+]w����-[������;������C���j�K�I����[�S��Z�~����3<G��������<�WS;���?�S{�i�O���Xr�^�'��C,y�t�����!�����O������C��t���Y���:�#�g��{����/�gm�����X��#���&j������rY�%�����3�.k�����]���K,{�+���I�~���t^�[?�t��r��X�]�>w��_b�z����E�/�h�Sn�g��n�r��z�S?K��o������_b��N��<��K,z��?�����;���,�����u�o��r���k��_=�T���C,��T��w'�%�w����w���;���e�%\IG�S;~���K��v���?[e�%�����Y�?;�t��G�~�E�������_�R[��{\?�������O��r����gKy������K�?�������|����[�����`���{ N��J)��r���3�=e<��ZN����.h��RK-E�}�B��UJ�*M)�g:)�/���0,�Ng��8,g�B���}^��������t���|��5\D��o���w������zw[����m��CJ�G����_m���5>��;������:�$}���?���j����W���w:����;�|�Q.����.�XF)4���0���[>K���rfu��#�g���R����UR�����������v��$����7���[t��n|��o��������������h-������
,�Z��*��}��@8��j������NO-W:��S[Nw�����Q�azj��^Z���3'��������N��?;=-)������H�}0=�^��MO;rp����^��A����#�7���E���s=(D��9rI}�������������pz�v��>������6�������N�q��ZPO�5�u����'����^�MO\8�izz�i����o�vz��)���O�tT-(��_G:
|��id-��3����izF���@���/�vz���L���[��3�{�K;L��J������7��9�H��0O�#�?;=g�i�0Xjz�����/��9{O0L�9�T���s>/o���.��{�C�����Q�(F�k������������q���2�(��ZF��	�����R���/����[n�G�^Z���������V���L����#�*Z�V�CU5fj�jO��������g��B��V�^)=��W�}�g}^��|m�{�^���#��_�-=k-�~������L
�f
�JN��-��
�j�%��P����K�����l����A,?r/�U���R=�����#P0�^��:Ri�4��{�V��g%H�{�^�A�u�j^�/^�Mn��j�nR�N�:$��]�	��3�3hA�].��p��Q�\��z�f0����p�t��T[��9�$B�A�c��D�W���G��r:���Al���w�C�9G?�y-��J��y��"s�F�����4�z����,��p�C����j�(��t�<���������������rP�<�z��R�m�T�\��z�����(�H��.<�Q�1\W��@�c�,�\R>��O�NAi��p��Q{::+���S"����8��q�����*��5�J
�R���1I�A�%k\#�[�w=�����
�~����i�-hx�:	[���t���`�����ZZP-��N�`����v�%��<w�[+WKw
����T)�������C���k����H�$s/?��X��6�3Ukx��p=*���a�*\���i^
d����s�4/��a�f�:{�`G�.��>�cW������8�;1���ZkI�}4M����qx��u�����He*J�u��R�����M�S?O�z�t6^�t�Z�}lF�R1�gy�U��3��Hy�I��w������p`+���VL8�U-��7�<�������U<gM�R�kR�h,4��v�T{p���.Q�]�q����|��K��Te��c1Y����]X�\Z�52<��'�
<���R����m�<Jt�D���}����g:O�c5������=�V�y��1�Tz��������#P�[���u��B�����F8+�x�p����G�v�|��x����V<K\��O�Y��j0��{�_g:.R1;2@���N�d��gyi�UJ���7��c�$r}B���|�������f��N���!m���5\�#�Q����B^{�Y�U���]����������h�}�r���H��\��|�p]��gZ����������9X����lOwZ����;������:O�9����*�����hR��Sw�>���tv$��=�t�����AQjM�c	���_=�p��D�[Pk{'X�r��L�J�-^_/J���3�D�Y-gY�Zh���D$�����������;�����+���d�Sx�k��^����Z4�l5g8�{�J�{,bn����j�i������,d}b�R2/��a���e�vy�<0�����-
/�������Q��������LWZ2�;��m�N�
�t���u�~�V����1���%�R{����cQ�1y�
0��E��zAj���E�u��g��m��d��${R�TV2�}PP�z���o��kE�v�3�J���5��%]w�dL�%�H�
��.��=�"G����^�"�Q�^�����H�x����J�:G���k��������<P����+ 8�2���{�H
��}�J�\�������_�����v>�k�G��"*G�
��~_����:��F�����yV�
F���]A3��F;�y����'��h��n���
�k�;��d=��J<;HgWK=sK6�$��������3��
�J�4_�]��E�����"H�i�n��O��/tg��G+Y���?I�Z��m����\�1��KN��R-Qw��|�)66�1�q�/tDX�O��Y�-������,���x��w����j�pw����@p063}`(�b3�|&��lfj@����I1���]��������Jd���� jg���yk������+�D��������������y�llf^���lf}4F��f�1 �����4��������lf�Fp�9�[�9g3���:���������
OT�f�.h����Y�f,9���4 d�9�k!�98P�K���<E�#c3�n$|el��H����8������n63��6��f�; ��ff�������w��yh�����k��lf��P�����.HJ���y3���M�s63�
�q63��!�������u�����������H���f�N$	���(`�����Do:��@o�f�ya���R9��\��YJ���L���f���[�X�����*d3���wt�b3�84x�663h�o/�������}��f�a����lf���0�a��f&�M_V��*�����;�9T�����yaS����y]`�8�9P��D����c�(����I�t�hhlf��A���������F7����#�����"�K���q,&'O�����ts63�b�KLflfP1��5#���������f�V�	o5:�lf��Ig3.��f����������xcl��k����hN��H�������"���v�����i������������lf^Z�
�f&3�Roc3�N-���q�f3��X��D����.��fF�>��>��f�Im`�����h����*2�����v��ew63��R�'c3����p63vd��Hflfj�&�I4����"������m�v��q8� :�9�3�f�� ��pm��U��6������D	n7��L�,9a���-)�#���������_�3elfP��Mg3SK��63(y6����xw��yLf��y�f�)^$�����S<���lfl���ullf��
�������O�~7��gg�{����lfT2��������E%�J���N����A��������������������l��x�
�lf�2-��263wg^�5�363�d��h��lfR�wd�t5����������<�9�fZ��nZ�����_���v����8���-����o������WoM0u63Jy�5�����l����Tblf��������K�1g3sK&�q63+YB63������Fo6s`W�%��7�9�n�-v6sp{\fC���f�K�o��&��f3cw&_/g3��6�C���f�x��!�����9�����������yQ���f��l���9h���J$���|��z!���f�����J�����R���y�������NV�]y���q��_��^f))����D�_�XH�u��I|e�����>����;��&3�t6�^+�?{�+���[�+���'�a�������4���O��u�
��:C��
�e-�����c<f}Es3*����q`	;��g!~�<��aVfn��9�L�([`<fZ�;�D��cZ�Q���Y��p3)�$�cZ����cF���<fZ�X��c�1�^�c��Hxx��y/��qi3�#u����D� ��k���E[��{�����^�T�z�v�t�@\[�~�4*��s�3����s���~�p�2���9s9�af��Vk��`���w3�� �7g.�=��<�4�^��b��\�Q�gc.c������%p�r��n�2* �k�e�F��8s�[�ac.G]�1�yh��3�q3���h�aS>e�������J��|��d�e�a���^k�e�a����h�e�a�.��\�6�D!�}��e����Ho�d.c�������4c<@��b.��������JE^#�s��eP���P�v3�Qg@6�2��q��������x�s3����GDrts'�����#�9B�Rc.��E���S���3�����������B�2M��$����0��%�����8���)q�f.��.��D���L*�Jx.}���L*R*������
g.sG2�Aq����E�tU�\�u��=�������C���dM3��u�2�S�\F����\�VNg.�6��}�?m�e��������t�pAg.���R�H?�f.S+S,d.�YnT������Ba��\�Y.e����L*�\����f.�5[~5�`7sM���"����l�+0���7�����u3����p�2O*W9s�'�R�;s��T�����N��������;����S��������pm�����)&��B�Zv�[�z��j
m�_��t#[�a_;1|7�
��U��,@6��^,v���l�N�����P+^<
����+��U*V��
)��
+8z�b����-��e����-���M���T���0d3�+�Jw�![�b_u���![��E���0d�i���R+&�;�0���r^;�E+#k�liR��,B�0��n�LQ��pV�s���Td�L����s�{��u#[����V��d`,9����.P�l���
��l�� ������`�+��R�xG�|+�5d��X�z��32�L����v��!��l� ������U7�0���p�:��V�6Rtm��l���$�S���\���-����cr�����(�rd���Cz.C���J�}�RG�td3��@~��li��L_q�!9����a
��-]d�l���l����<���PL���[P����uG�JV"�+�yF}-������I���=���
7�0�����(�w����P,h��?�z-57����Jd=��X��f�:I|�-e��;eZq/[u�T����N�������)���h(>`��g�V/+���9�B�X���j��c�GkGjw0t��<F�W (�_�<��K�3�{�r�Kz�T:���g������+{����TA�(w:+�c���n�-FB�x�H%��1BG����uI�q��:�D���)�$���~��Q������HPkl������Y\�n607^B���U`����c��Y��y�r�8��;��D��9Cy�rM�dq*Z�t�@\�F��,���� >-+�)�5���1Z���������.�0�3�9�h�)?q�$��KM�
�5q��G:N�(���4�SN��JW�QR>P|�>[�
�3��^=5��t��g%�;�Q_K5�J�U	GI��@����p@+�[%!�
fQ%�]	�gD�]�p��NL��GW%�>`}��C�(��"H�Lo�U�RR@y�<�TuaXN����Z�)��H�z>Q�|=�Z�L���=����y$)����.��;����X�I
�9���<�<
��|��"���y�tRq!�yj�QS��-�f�y�X�"Ji=P�C�(���\�a��NL��z;yE�uo�44������������E�
9l�t����
�V!�`q���R3Y�T1�BN`�4_�*�]����L���9�$��q���Jx�����kz6y1<w�&n����g�w%�M2�&������l�zA����M����Dc�o��������(�u�&]�������s���^�&v��Xx�)m��+q�� ���S�;����73�����z3q:;d$��8���x�����7���y��z$4Mw�S%��G!����Z<S�=��J|'3�����>LI|<�KY\���R-��["T*g�����J�U�F�A��_��|d���!������J6D�5l"���R���b#<�5��.!�������:�I�jWYO��3,���2�����j�Du��@cY��0��q(����`(�#�5����i��J�Y�#
c83&��[j��0G�����J
��F�?A��?I��}�]gh���%�5�Xe�����ufy�}�S����Dy(ok�S��M��7�Icu�yZ�:��Nj��!�D�H�>9�<`�G�J~��/��y�����x���3��d�CN�6�.�Us�����(��z�LH�eP�#=t;������h�M�`N�����3�6�`|G��a��+
F�-#�>�:�)��CT%-��&�JAVw��G����`a���TkX{���@X"��N�����\sIg�����t�ZxZt����<g�����+���)(��t���:j���U�#]C��r#ab5�w�5m������0��iid�*-�z:o}��f���g�/`������e*�z��Y��:��gv�t,T�+��S�ZR�<`@�m���X�2�!{�)6$":��I,�`���5O7��5f��O]�O�����U��X���(��
(�y�OE�\�fk���S���$��LG&a����I�Kl?YH�����@�����������<����a��F�S�y��y�
�YM��"�Fa�|�#�0����(�	�g.3��GG�Y���@X�J>��A�����ko����S���< a�{>�@x��3��*]����eHV�/Gc!�~�|2����EW���IaH��x�k�U]����z�����^V`��d;�Q�c�a$q���2!�f8.S����E9�� n��Z��]��E������x�`*_���U{�w�s��&��Iam><pQ��Ts\�SV���������9R���#�!5)����������E�4(�iA�r�r�b����*�Z��G�2�}�*^���G��L�,�=��E�8xs��E�j�h��Z���z���Te�Bu�)d�C��W	��v�"�k���.*����<t,��<t�,�~l��E�l�1���d��)f��4��u]$q"���E=�?�b���z�2��Y�"�7�sa����XZ�Ca����CAB�=t��=h��O����E�qP��Ba�P���.��FM���|�������Iap�9��H��5[����������r�l�����1��=�����`����=��N����i�d^8�kz�w��g�H����x��D5R|qi��?�t>�O%����7 �y ���y���P2��Wx����<�T�7By �,7��-���\�>�??�������y���!��	�g�w��"���E�R���Z��<�O%�o���uyN�'�fSr�#�N&�u��/�u f�=sz���#��ia���9%LOO���;������92@@�4<��s��\�D�4���0�Vr����~?p���)%B�'�S���80A�Zt��M�a�K��\�����J����1�y����O���u9���h�A�!�oR|�x�;,�y�?�����e����d����JC��r��<Syb���L%���:�c�]Ow*�W���W���}w���l5�����r�_f�NA�b-�)�<8:�_�������3�(R�� qG��x��(������;����t��u]����~6t�5�5Q�/�����O��C��OJ�O�����+�NI��yI\G��P�Wx��%��/ ���g�{��E��e$ 7�!�w�u�!�8(wi�C6�"�������E�0q�a�����&5^��<�����yY<���
��~v\�H���8�)��8�)�:	��������s�V&Lz���kj�C�j���w\�9d-�_�8�����'=�u�9����A�~v����<|��'�w9t�<�n��d���S���GHM]������v�G�`��������V�F>,�H]���d������
h�
?����]�{���Y���[��>��NN+�H�0����6P�nO3V���43�
��43��g8�&��z�RC�����r�{�^���W{�m���Z��aB�yO�
xo�l����R)
�l�4<�#��]��d%���yD
�h�V�548!=���U�Z� K9C�zf�S�������)g@�X^/	%w����@�+������;�l��Hb�]����W��c`��)������k;R&7T�>���
�C^��v�$���#��a������)@������44�&�������$_��t��=w�v������u���sWt�^4�y2���O������������zX	�$���h`.2���b�|�~��r�S.C�K9v�����U���44�h`�S��l��{����}��
O�����?�'��:�����1��X�4�J�%/��5��,� ] ���)�Pj��C��
SQ���K���SAG���T�H��bU�
�*�X\��b?*��
�Fk
n��Vy4qB
���������p�u�+/,N��+�s��ku`Gf��v�m��*���h
ZA��Pk������VTQ0Q���`u~��i��[1�:��j��\��K����]I��_�Ke+�e�^�������+[��`�#1~e3�a��
��ES�e��)����|�"c�@l�(�Q,���`,����M�����A���d���o0#�dyiM���?n(L*�^7,�V��]zF�Fg'������d)���y��#��M�%@�g�o+�hqFf��&�nH��
���N�������
�E�@�
k�#�\��������m0#<%��{�O����]#D�'�����U��M��LHzvI�5#��8�A#�����a��������_�&��8��M���kb�v���%@a����&�b�A|v��5�	�(\�M����uvq����]:�a�]��bhvy��4?8t�%hvI������m�.��������&���iB�v�x��NF�Eqme���/P���l��A��|\��o�.�	Mg2�n�h��&������������]���1�.��Oxv����o�.}���l�.-Z8������l����#�~������:�k��]����n�.ZMN1�n�w��M�
,�����7�0D��y�8#���R{�6a7��I��&�r�����Z(G~vy�?.�������a#��q�T�*�2v����/*�
;�����4y��Rr[��P�EK��P���.5l(5�YCX�?4HM�q(i���:��CA�w^(NF�*����4xTz�������C�����p�{���*��Z�N��j�K�g���Jc1.������g	��0�O��?U�#�?�z������v��Cq8����I]��u�"������^��e����?UP��
���$�?�q��E���P�����@Q��F����C��?����x�������TX�����jK�?�U������������p��w�?��8�?�U�z��t�r�E���o�����h�����C��.���F�^�^(������CAE����C���N�?4��J��PD"�P}���:P�)����`F�J���Cq]4�m�?���;���i��JD�L_!�^2�Cb��4��S����-hX-���R2C�8��2d��0y(���Q���ly*������G[�5���T�U� B����9��*6m�}���,C��-��A9��b+�v���[A�-�la8�|h�lu+��o���#[�6����-��v��-���l!U��i����[1�������.����v�1b�:�o���F��b�(L5P1�C�li8�//�{m�V���O�
�r+(���-��T�Es�u��r+��
��Q����lA���i�F�����#[���G�<�����l�L�|
��p��O���-��>G��E[3G����y����-�(���wN,���mp��'=G�xi\y"dKf�V�rd���B���-^&UW�kC�8#3�C��!����s}:�����l�8#�l��F
c�^#��|�l���ZR������+�����8UG1����3w�/}]8�������x����K����\_%�����Z<u��� N�m���'o�q}A.�������y��\l\_yH����8��Oi\_:ps��W]������n\_��U������n�/��]����&��J�e�$r�\_�����4tz�;���J����C����e#%����%q���\_�2��������~���
�r��%q-�\_�2���v��5�/}���l�/)
����Ge��7��un�/�w-n\_�De=Q\�w����	�z��07��+��5�/^���h�������8q�is}��j�q}����yy�5�3��g[���h�c��R+��q}{Z��3�G��3�+����6��6�
-��(Z�zso���\�	��7��0]_�R���Rf+�/6����}�6<7��Nb�����_�z$;1�f����PP���u�P��j�d�/�BQ�{}����V��,����c��+�B
����N�Z���Pe���b+qk��*:�;x���b��������5
�����`+�t����ge�m�	�h�>Pj�*?�@qB+1���cAy��*U�������TlNT�
��~A}�`~:�}z}��!���l�<�>P4�Z��gd�a�^�����C(��y3����G�&K��T,�F��KW�t(�"S���
32Nu���@A�J�%���@�#�����@��$�j>P:;3�bx}�x�i�����3��x}��S3`��
���u�������w*q����$���>PlE'O��@��W5C�����}mR5�waY!;�,��aY)���$��XV��H��=�=�XVkXL_�
������R3���5,,����FXV��<] �����`Pa���+5%2,*j�%�eIE�Cf���L:��
����R�FX����9��	����e����{ �����b/�����(���XU ���,�x������+�*��<����,%�`
��8��,R�a�`(��X�U`Q���K�:���wF����aYP�J�X���M���E+����`�V�?�5�X�;2�.�aY��3������,����F���b,�cY4��b�����8C,��/�eIE]��#,�����
�����aY��}EFX�;B�F�������X����Dt�e�V�G�e�b�B�#,M�v�8�%+�%�T�����Q=�`8�b�cYN����#����F;U�N����-��	��q����J}�Q_��v:45���0��^�}|*<z*��
��gj5����u�^�/K\�K:���������3@�(�Hy�9J��]��$)j���e(�:j=��>�t���N=SnZ�u��N���1?oIGaa���-���.���y�)B�K�8�}��c%�����e����������z�,]��H��}������������Ra��}�[�0��GO��I�����i�����{��aJap\5���!���!�Q~R@�0��>o�,������������������C��q�#�=>YH����$x`a
.����ly^a�I�������/	���?o�,�=����p�0�n��
$���z^wQ���y�%aX$gy^tYr�>o��g�q����>�''���O6m���y�������?��������R9k���k~�����Q��",��b>HL���'����^6�����OF�o���4�I��C�.���U{�+���"	i�)[�}%=���c]�����*s��� ��	��tvV�<�z�t�r���H����������f!i��?j����������Pb���[+������)_�o�������O;�tR���k0�^G��`?^�#(��Eh���i�p�+��c��F[NG�PV	Z�������U@^#�~��fVx������U�]8�T�@��.�%5�����inrK���B<p/=�G�b�z=��"�L-8r*�.+�H����
)x�8}���d��5�}�r�V�u
d���J�E
�R�����|��E��8��x�
�!�p��t�@�VG;�8�Y�\���5_V��LW�1j�U+�E4;h�G���t:r����-h������
��|
�JN��]�����t��,hl�z*��`�=<S�<��m A����a��FM�?���F
C�8[�k����=R����M�H3�\�����W��_W�Y{�&��T�����$��0�
5e�3�;8���H�w���+��:��S�R��~��;����H���a�j������b�wF|�4��l�$q��LP@>�f�.���$���}�0B��>zp	���RZ�SO�BH!��
*%�y!&(���BLXv����]����7�F�/��'��%.aH4���v!)pxI-���/u&�1��jW����:��
V_/a'
�sxI��|�H�����>^�A�b//�*D�?����_x)pA�^�f*T���%)N�/A�g�/�`�^x�ga�Ip��R*��C�%
^�i���/������d� ��/AA���K��T�����S�%M#�%�F��_x����,
^�: R�/��A���Km\1��(�cL��W��H�RZ"��2�A�����%����9v>��#�>�<��\?��K�4/@Am��O��S�q-��)g����;�N���h/����#����9R;>������w���(����t��DkO%�j�R�g�7.DM4����Fu�4��D������5R����W��G�G��h����F�L��*tB��3���h�S�?=Kj�G���������=�l4$;2���}������W[M�}���H�~dZ{�,$
��+��3���|&:��}$:rO���h9����;]�3����W&�O��j�{�$��+�N�s�Q>�s�2�jQ���?-=��3�z�v~���������K:�g�����K�=G*�3��J�~$z��:�)����v~yI���'D�]����0!��_^���ia�2��zJ��/x��&"�������[��m�����@X��,\O�64����0�7Z��\�T�������>�G�2����}�<=\�#a�����g���J3�=\O�m�:�p=m�l�����X�x�v�CI����!\OO>�p=�U�<i�zz����'�!����`�t�-\����p=)��]�a������.;\�M�'�������>���#a���p=)V����������
/,\@���p���-\�s�G�]���z	[�������]w��G[�g�a���p=���p=�P%q����fEZ��aPc��}b�,\��;�������TyV�+�����rw���D�t8��az����L�pL4����pL|�Sp��1-����>�����&Jc0Q��h`LO�v0&]����1!������O���������'���0y���T�J���Ti����h���crO�C�c0`����R{J#�bzOi@�P�0s
��_�bj�Y�P�����,N��2(&�\`7�PL	+����,"����3��m���_(��1J��������PL
�����/��v(&��~~���U:u������v����0�bzmka�b0Ua��*�'�S� �iK��
�����S@�e��gNa��s(�,��PLSp���TAjO�bp���	7��Jd�T�yz���x��l��J�3�~|k�N�y~+c�F��a��o��_��������o���2�����L)�5��[�Q�������o���c#������|>�[��|��n�ZX�u��sU��r3�RC���XD�������mA���A�3v �7peX ���fx`���;�(�a�2�����B�A$����8������e��{ `���;�O��f�|3�J�i���5��`@�`�
�����?<��@GMx0�������.P@����A��
����4�!��B�������-v@�nA�$�H��7*�h�x����H�F�; ��4p'Z@ *���5�< 0������lX�y�������4������t���@:�, �Pp��^�K����m�F
�|3��< �<�X@`p�j�����+v@ ����h�`��4"���|���D�i�*����U*v@ �:��#����EP%QW��`�\�=���98���������,#�\�
+G*`�Uq����b�^PV�EP0�OA����GM��Z�c�]�IW��5q.�J��\�&�6Z�ER�T���5q,���t�\d�`��������yYT�U�X�����R�JLD�W��"�X%����.RG:W���tZ��[^w�[����]��s�
*����
�T�C��G��"�C���"/�A�W���|����t�^��+TbC����J�z��HG�C4���V{�O��p�{�E6����k/����#^{12�������X@Q8��H*2����h�g���f�^d�X�x�E�r�y�Ej��ua��,��L5Z�EV��r�X�E���^{12�����"�"cV{��Xvpz�E�f�3u�^.T$�j/��L>{��D���k/�X��r�ew�E��w�zx���Q���^�u��q�XX�EV��T��^�����"^QVQ������+`M������NaD���t��F)�o����g*)y���>��;��JV�n���,+�[2�����������
j��}%��,'�v��k��1�F^���J��d�����u�	@%�(�h��Hc�?8���4v���dF�����N��Z�zL���	�bk��n+����(�![�+����N�$�s%%;	��m������y
k`���c�8�+������g�>��^	�f�f9;���(YY�4���y%*���nN�����<��X���OF��Fk����z<�
h�v��
���'KW�v.|}8zI��K���8R����I$��BF�����<N���J��l��vA��F����{��:X�IZI�)7V����O��)Y�f5:|��PI����d����L���-�w�N���J����0��d��JH�|�r��u�b�Qrj�����u6�Qk�=l�B�:	���#P�*�<��X�.Q�c8��Y����T���i<9�J4&������aL2��{���c��3���"%�x�]���������V2�}�*LG::)Y�V�V��G+�i�5��W�Z��rU���^V����+�p�TK4�+�P�����1��QG����������(@I��f��&M������,���0�b3��$�163�_?:�:0�F!���_��I
:���
���D����-�����z�k��&o63�64�f�}������(����IA�����y)C@���y �����"#�9�r���c)g3��j663���563���?�lf� �f�3�r8�����f�VbX��f��:�����l���j����q)SN'c3��V`l��P���<:G����H��"�f�Y����f����f�������,HHhlfT@I����G�Rc3:��f��-���8T����l�����Y$�263+��)g3G'��o63���6����K�7�[@
����4`t��qu63�^H�f���[�H���O��]d�����\\f�`���Q�����5�\Lf� �;������`3�%�\<f�`�]�i�R�����k�,����)zv���$�d�=�9�R�����,5<��(��`F
D�s����
���T�u;���4�h���L���������A�m�����\�m5�2�b�yn��>��3d���Z��2J�h�e8�2f9q����Q+��f-��P�ivV����Y��l���)���*&Rr�2h��J�`#,�q��=}�s�2�CY/������&3�
:[lv�<�NV�Y�]�����2�f�I�Te�����I$�����f�8Qv�r�����S����R�i�)��G�^7�2����I��7'��������� �(���,��P&��)���v���K�l~2�$�n�'�3j�c���t�<{K"#'�	S�Z���M�#�R`��d��3�G�L�����nb2�@�g_��4^2��y�*����s��3F���J�A�,��9�<����@J���uG�I��d������dT2!dJ2R2wg,��D����t�~���`#%sK:�9)Z�tdqE%|��d�N��KFJ����"a�&%sK�u�I��dU
=���I�t�NJ���K���������rv���m�����;�'%SK�2�)9����"�&%���-�C3�����RE���yL:�&���c�fa9&FJ����j:��&%cwH��I��i_(����YI���NJf%eEJ��I�8�;�#%���H�p#%SK�
��\�����J�
���JV@7$���d���I�AK�D"�MJ,��<��j���dm�I�8���RB�MJ�����NJ��L1NJV��!��&%Gw6rU9)9��d��NJ�Z2 R�I�8&�e�NJ����;FJ�jMwsR2*�G��68)�-`^�}$v��d�� ������I���MJ��X�����tg�/]�u���+(e�2Rrt?)@BrRrt?����d���b���F:��}"rKI���T$.W��wj]�/O
�X.�?�[��!��-���1�rOw%��t$��R),P���Z��+#nLG�3>aD�,�������,������-��NG�1h�����
(Q���qCo:2�� ���#c���tdV�K��M�Nh���8��5:2�D�t:r� N��+�����-�$kNGQl�#��j8���
����c�xx��y ���qi3:��:��q�4,��u�U�wM����JX�{���1Y���A�+DZ��;-�C���Feq��N@�NA�/5rt
�p�M@�H�n���i�e��J�ed>�l���A�N����x�)q2)��N@�3H'�prd
C2u���L)C���]���F@��V`���J�kdn�~rr4�:Tv�a0����r����)[:�a�1YH>h|=��\d�m#�k'�q�U��MJ4�MB&
#s��L�
{�e�6Y+�$fpiYN�$P\�1�I��K����J�j���r�E�G��IE����FZA*���cQ	�f�$���d��k#I�f�d���
{nJ2���+�9����FJ�u�(����y]�_I�L�������A����I�FL�����Ht3�������65T���>���L��>p99�8%
pv2������MOf?(����l�x/��L**|�PF+�P��MQF���0���c��n�$e2�3A��,e��
��;M����@�����J�a#*����'�n�2o3*��TeV1��q����u��*��*V�
����
4�*���`_��_�G�����r�2�T�m��e����s�Q�lIHZ�I-��w�2�E�PF[�I��0�2ud�%���e�H�R��ks*���\b.od����#9��!��F�������m�N9�
���6h�l�
����l�
�.Gr#[��[�����\�[!�����`���~���z,ud�T��`���Tl "[P����<�^d4(���V�XY��K�����b�P�)G��bE�KV�#[�\/��VT����&�g�,�"[���z�E�4��P�"[=A%���VL�����I]Y���j�&�b����P2�E�0#��	��V��Gi��#��i���l������U����l���>E�Je�b��WG���18vd,�_��E�����B+8���lA�{^dK�h���������"[�b3$����>;�-G��#[9�m�h�l�����V�b�$sd�:�_&D�����^dK;uU�	�-.�A1��l��]�SR�����"[�.6]]C����b���#]_T�������W��	���z��86��X ��1�uW�/k�a���������i�I;k�Z��b�����/�����H�BiJG�F�N�^$��S�u60{%����okd��q���f�z
�a��g_V�FBd+9���#M�$kM5��!��#��M��c�~c����J��-�\[gN��g��7����J��i\�����������ISX�Q�tV��p*��[[s�ILz���&�m���o�7�;]��vH���_2���m����K��o�/Jk����hA%@3��G�������k0Ux��������
�+hFB����
g"�\S?I����r�s�4E�����-'jr�S9���!JNn/�5��~7��+H#u�=��G
*�\g�3K���3����S���QVj�I ���H�	�+�T"����oC������Q���QR>H:S�����p��r{�tT�6���L���uF	pm����L%��*M�$7] ,M#�3'���4B���%���i���:��Ay�iV���3%���4�Hl��r�����XPP0�n��XP����-����"\��0
���s����0A����K��0�#��9,QC��^;�	�mX����	��$���KHL���K^���KRQ�D�{/Q����t�;6��s��bCq=P��
��l�/�oI��6�e�/ g��-yQU��d~�@�N�d
���.��r��Ap'��5�OJ/Y���d
�\H�c%k���J����E��=��Di�#FA�
��PrG��[�w�}�����l>�h?h6��(�Cr���2���9J�IO19J��%rT)���u�$$ePs�d��J��wU����w2�NJd�`��gP�nw)���A
���:�_20x�����|�8V�K�t�f��%�+��c���PB"t�e�p����j:U��n�Fw�NI���S��J8�Ny�:����y�����j��T+XR!:�Z�������Nev��F��N��M6��[G��b:��+��)td���0����
,���S��LUd���,��*���S�������E��RG��"�z��S\�TZ�E�0��^tJ��_t��3����T�`l
�T��T��lk��^t
*s{�R+2���T��E��T+X��!:�d0�/:%����&W�22t�-�]pt�-��I�������NAA!��S� ����N�x[�1:��
����T,���jd�;B5C^t�/ �<��S�`���TO�p�
���j��NI�/�S��:��D��NIf�z�)�'�o��)�"����S�N��M�w�/:�*r�B�/:%��S�������X��5��_���qVJK\���R��F��H��t,�Y�d'��%?��*
k�R������5�t�O�KK���'P����s�Ha���J�da�2k����5e���Q��������L��0�F�}�Z����������pmi�O�[O��O�UyQb�3�
k�D9���5�����O�|�w�3����#0�y��}8�G-�T&��Zj`-�gr=�����-�4�O��!KCg��)��
�O�$��Z�,Ty�,�0PT��`�S������p�*\|���q�#�0 �'�B�e��})�0D��C���Fy�vY�=���
�{n0`����������p������G a8=��dF@a��y�����5\)��-�?�y�
��}��+m ���O��}O��L;�����\�k�3d��4���-��w���K���I,X�c_������0��^<&��<j����BC���(�w�mXE�w���hF)�.��	q����2%�8f���2K��f��?����U�32�<�Q�W|���F)�
&�MN����"4%��N��/�Xg���	������+Y@#�9��=�QS�:R���a��Q��4�>��*�h�}�e,�Q[���4�}Fi�=����4��	H����Z<S�-������F�m���F�jz�zp#X5x���F�j�xo��3F�{-���l��������J| u�C��!u�8�e�t;���gZ��^�
�H���u�9���@G-�b�=�Q�����:�}������B-��F*uX�#�����;�~�_�xG}X5J\c�r�0��G<���*)�(�g�����T�����WjxO	C����@Xf>u����������w�����������
=	��g��hO�/x�d��5�^8�EC�'z��T~���m�}���6V����:�D{��;�EzZ���������[��yj�/�t����S�W�yr�������`�,
�}�=_���a��<rz�4Lr�'��e�_��w�:�S���y�r������rz���i~9i��
����p�������tl�m,���
�����=���^����_���N�P���/�����
�������1�04���R}}t��k����a���G�&���
���rd�4s,�����z�`�$sO��m��z�������ub�������p�i��8}���H�9���O�p�2����R�Zr��T"���������>�cVI��v��qM����B�.��.��=��u}:8���������R��*�C���x�qaP��������Ui^C:`O8���Cz����4�scIf��A��_i�$���Pa�P:��
C��p��P������E�y���
S��SXK�J�-��C�CG,)��~������u��������-i� ��aK�2�����t������u=��?������3+��89���7�I��~�r�!�C���?��r��C�{�?�5���a��V�cM��PkccM��q�T���?���l�������H��6�9���n(��xX7tcN�2����C�h�����]��w�4����u�I�9O
y���������D���71�]u ���"����@��J�	�$k�0R�	VaM�u Y���:��8h`��@���q"V����eu q�)��@��g�E����H`�r�R��d����R���&R�`����r�(�au ���+u yRl�U�ULw9-W
�b���/�s�@+���j"����Vc����D+E����$t�YV���@;��FX�����N���U`�.�X���k��@��I�:��:���C4������;x��dt��{��@F��XH�:�u kO^2�T�nju ��a�=���Ie�=�M��5{�����,��YlxR)f������9/�
�]��[��������u��#����HG����u���\6�u
����������^2�f�hu q�/K"1�����=���&:~�smm��3���3�W`ayF�_P�<��9�"��5/v#[!;�2B��Z����s�Ih���01����lA�@
�#[���L![�a'�����:�RDJ$e�;1���F��
+����������-���l��&�#[\T���E�Z�*�A���X����Xtri:��V���TGT�U*v��/������td3�2<�:�lq�c��#[n��^d�9��G9���t�h�����\��pd�;2�.����h�/��)��^dK�}���-9x�y�-9�G^d�o�,�/�
�N�^d��I3�����`����'(���-Y��!��l��h���lq�v�N�"[��oP/�
.I��l�Q6���Ag��l�uU��-�Z3�y���[�!���li8���"[��!�rd���L�����l�����/�
�Z���E�x����E����!����-���S'���-.-�����Z�tD^�6��	6���Bm�i9c�*���+JB������J���^!z��� Ni���K�T�d3}i�(�q3}��`X��K_�;�����]L_:��.�/�n]L_:�)o�/���L_ymH����1���ecL_o�b���i�cL_:.�eL_n���f�r���7�7:9���8�Pq3}��.�������d;����9oL_>m��7���k���;N��n�/V��jL_�w
�6���C�/�k��1}y����>\���
�y)�����������Y�s�.�/�vDn�/�C����
�6������e4P�L�`���7�7Xu��
��f�������r��6�����1}�k�@�����~,���&����lB�op1�bL3���.�/�:��s��R?��~��Jr�z�U�����aU��Pj�CIC�HZ����X�c�CQCY�g�������K8=���	������!sE��
&�F0h��Fy}�?4P1 ��CQE]e�(�X|���(��`��x{i������G�?4XY�����*:��?�T<���!�?4�L�����we'���
&�r�?484�T���X����?4l�qnh�:)����Ff����X�3s�CC#�	���p*�����
��e����n�P���%�?'�{&_��C��E9���*�+�����	G��C�Nu@���h��7hxW�����{Tlh8�0�?40�&e����i�C�I��h���W=�m�_�Z�c���A������hJ��6;9��?�g#���*���A���GN�
���~F��C�{'����������G�f��U���p�1������89��4d���Y$��^�����4��0��lq$)��![R���G�4������:Y��Z�![�����*&�����i��lq4����lyU��:��*v ��b�li����.�b��������v�|������
�cX�#����E�0�?�����-��
U	�-��L�h^dc1�a+ud���-M�$�I@���:2��1���I	^d*���M�l����8���:NBd�'x���ly,�5��U�#��l�#��#[<;��liF&�-b����/��GN��T�l��b�����/�E���/�e�
�	/���k����Tq�E�x�g�s��g!�#[�#���"[R�����6��@uPG��*B��UT�w���(�0�#[�f��i9��V`j����[�1}����En&M�6L,%%�\_!�s��\_�W
��J�~%a����85����������n�������s}�x�/:y���jq��8����k\_��������L�\_�:�����~)�������_����A@�s}����������RL�l$�5�/��#���}�#o\_���H4�/O����W����7�/��0�/6|���Eq����t��� �����l\_0�(4�/�]���h� �ts}����l\_���F��������K��R m��gf����u�s���g���n�/���������������l�q}?;����g]��+MPw�/��0�o`"%2�/N ����-#]������8���FR����u�w����K��q}�������~e
\�����N�����4�t_��T���d�P���+����75�|s�74,��f
oo(���gzyCy0IN���u���Z�~~�
V���*;������BI�tn��|IE@�s_(w��G�|�����=R�E��QW~|�	�R/�`U	3EC	����8���������BJ���~���!����2;p��U��Z��"���f��	�������>P>��������J���%D�n(�b%F�|������6��4�nEY�.���������@YE�>P�������}�����'��rG:���fd��yn(���C#o���)�0�|�h���>Pn���2���`��U���X��}����4:r����@��SgGs(��?����@#{�o��e{Za����e�th�.��l>��z����@q�N%����I���������4:/4��lN�_�C���X�k���n�*���Z�����;�&��r,�5U#�j
�v�cscYj��
@M��j&���/`Y�`U�������7���b���V8�3�-�05,KY��B,K*0���rk��e���d\����N�
E:������������O�����m/�����Lu,+���o���R�X�c!��N�U}$����aN����~�dB�eI��q,KG���XV�U���/�E������v�yn���,�*f�XV�bNi\�&�u,Kc��'^,�gdG�����bF��X�^z_�,o3p��XV��VD�H���b��E��,�Z�����0#+�X�T<W}��dXV��6��`,�����%_�j�b�W��eqRP�!��dG0���eaF�U���b�b������KR%?�cY:/&��2�����/��Vd�\��|���	�`8X�J|���QB��MZ����,�������%l=���:�Q9��*C1�;�|YB�\��QX�����uk��<������3G
Q��+���ac���1������GM��0d_:�u�0�G���[�l����T+�6qvK:
������T�z3=�7�"�H�q����X�����/�5������z'aB��H�����|���08EK*�L����Z T9�����aM��g7C�;]gxd<�������O�@X�����@am1zy��YB<���O���e��f��Q���V�q�#�0x6��dy@a�%��$x��sW�|��0	��<��,�1I}�}�/K/������Q������������p�H���m ax�����+E7��y�ea�4���|Y�>����}�_n�+.k��������q<����S�y�%a�w�����Ov�y��3F�Q����Q��FK�kw��]�� ��������,1YN��v�yW5,�����\{t�.���G~�_����������'�JWf�
wO�V�}e��<��`����zj')�����C��� >a$��(1[������,�>�2_G*��S��aV����H
���jNg��Pw�P��j�n�`M#VZ�)_��
O{������N�����
�%�:�n�T��u���m�H��4r��r�Z�1���G��(`FT`�x�.+`�j?�yG
 ��#���U*JC��N�d�v�%5��+�1���R9�����^�KO�)�
��Lwd��#�R�r��O�5����6=��'{���������s�qkar�~]��P�*m�1bI�c*�UYZ�#R@�DkOG�v�v�q�,,���w����^��B���)W�`e���<�#��
�T�HG�L����R�c
���+R@���t�4�+&[�z*�,
����u��U(B���
��4��r�%�
L�7��j���`��� s�Hqq�`%:�\�a�Bl�5���p]$	�k�����5U���D���=�a���$y8Dkj�eOB���S�9}���8�d�;����+���z�-.D�3��Y����8�+��3AA�1p����������j��g��)����1��mbL����A�1���������1&��qdy~�V��Z�����r|I��/����U:�_�x�x�%)���/�`I^|�M�C���L���"C_�4B ��K�F����%�C��i�2��[9�D{�J_J#�i��K��f��R)X0��4|���h��%g:y��/Aq|	

�Q_�4~��_j3�NWU1|I�H�M��8�������KR@�J��<��P5|	������w#T0q|	�8(T�����1�u|���^_����^|	
*��4N�4�_�_ �*a	,s� �!�Q��o��`?�s��y�1�c����y�������2LpO[������n�E,�r��I�;�'H����{|��q��?��9R;>�����'�-�t�O��JMw�H��T�\C$��T��$�;z���(i�I�-]�#�k��?���T�O$����>��5���d;�U����8���G��N��H�,�]��������H�����2�|��$Ys*�G����>�<���'�����"%)O����?��J��G�wK��Dr����#�r�1>��w��G�GMY_��+��T��}�?o�C��H���(I�9]��3�t�I�����d=S;?�lw��#�^��?�-������9Ri 	�h���'��rN]������F�"��X�;�N��<����;)L9g,��G�)�}<�N	Wx]��;=�jg��;�tH���}�$�u���u�G[?z��m���;%Lau'�������0$���;�����QwzWiz�G��<8�QwZX{�=�N�m}y��m}�x�P�Qw��e�-�N
e������w���U8gQwr����Qw����;�����E��T�QwZX�=z�L0.w���2�uG�J'b�Qwp/��<��#�0�N6�\�u�����;�eH�bQw���fQwz�t��G�����w��*(��������2u���v�}Y-�N����x���*p�{���G��;-�I�uG�J����]%}hu��~���Ov�G����l*+����E����+������(��4D�����"2-U

�A�%tD����|�f;"��6_D&�M��/"S�����0
0�Ia�sD��5:qD��
XV/"S��Ed_#G�Ed�����Edzc�B���0����"��<Gdz��ce�L�*�<��L�6�!�ia��3D�G[� ����BGd�%cD���C���1�7������X���}	�����������8Gd���V�"2���y��L��lGdZj"�}�r
���h����2;"�^D�*Fdjc�1�"����`c�yP�p�@��X6Cd0U�gh�L�*����TpD���f���]�"2=��8"�[R���*�7h�L��;�������~I��]$��
endstream
endobj

3 0 obj
28292
endobj

11 0 obj
<</Length 12 0 R/Filter/FlateDecode/Length1 14872>>
stream
x��{{|SU��Z�����I���4My�/
�D��X��Rh�>B����������b�qPG����J�q��:>pdft���u���;�(�������}���������������Z{���Z{��w����0��h����~���W�4��%x���B�v��:V�����N����
�k~|�@���vg8jX�^0�L���&�(&O��������������������0�O�_�)�A{)��������s���+cQ!�R�NH�X<�s��~�io�sAz��Im�fXN�Ted�5Z��`�2��-�q9����v���,p���{�=
�����{����H��Z�����o�P�N��p�!��[���)���@����NC{a?��O�n�����p�pl�����0=��1M}^�vQ�>�������}8	7����E�>������a��F`7�����aKZ@3�!z��^���F���b7��
����V�v�>���4����S��M�m8E��qx�"]6�U��7POS��=p7t����v��|�5��n3�	&�
)�R�Mn���.�g� ���v��@c���%��u�.��n~���U�������}���fV��>�bJYiIq��	��g��n5�:�&3C�Tp,CSE���U"��
�Ag�3XS\�WY;�U9�[E>����"S���!(gP�[y�0(��Q�V����+8}2�o���,�%
���7�9�~\�������9��)��)$
�<g��(.��V��|�X�����u^q�ef�u�
gA_F�\����"':c}8q6��X5���FV�]U���_�X5��p����Z�<B��D���D$�T��|_�@���zX��V������"��U�����8�9O���CkqQUX,r������%#��^E��w��[����	�1�K�7�@��+��F����������N����7���Y�����>��7V����o1��zv�M��3 �[;qf =��%�b����"���;�"�i����asFx����b��Iv8$3������"����Qn���v|���H�J��a��A��SF��:�E����"��rVED�����J��7H�q�E��mg���W�//����/��"'��Ad
�.�z���]����Wd

F���W�Jr��U����uZ��U|q�X��ai����W��`�cU}e�U��`����y��b�3&��sF�+�U�o$]��D�\Z�����*������y�
�,���g�������{`*�I��sE����1�.�[m!�om�m�1p6�R�9���A	����������+g��	�8�Uu�g�M#�.Q�R������K/�.�Zd\�9�D�%*\JQ������w�,�m0�-N'�U�yi>�=F(+����ai���un��p�Oq%2.>=�����Qk�I���R�\skJ��U
z��v�����7Js��C��6�y�WK��F��HG����dL��mm\�Z�i�\A�?L�{����^I�3-D�5_)�}36����:y=_-/��>�OZ��3%!���^g}�,�]����Fi,#�b��9�E}��s���}>�Q���=�ci�1
���s}�cq�3<��`)	+!�/5$IK�QJ�o{��C�A�v[?�)�qm�������|@A[?#S|����+e\���$��2X�����)
e�C	u��)�E�q5j���C�]B�������d�P�O�pG���V4W�m��HOqQ���Y+m+U|H
��:{[�b�l�r��Et��r��C�S���1�9G�{%�W�s^��#b6�="5�/�+N����f��*y*�..��T�	������}�ci�V)Y��o��i0be��c�L)�rY��M&|�������6��,��I�M�s���>(��|v5��:�9�����D�������#jZ�Fg���2+��������5NT3j'=n���a k���<T�y�Z�h�p	��8��,�z����`�zZ��Z$�`������&7i��(-+�&�������|Z��jB	]1��Q�mQ��3�3���e<�|�|+�����g�3��y����'BhF�B��3���:V���k6��X�������6��ts������Vn�{��{��p,zM�Un��=si��?cO�Z}W�
�RaT�����
3��h���A1/�w*spy0c9�$?k���q���+�#"3����l��i�*���4x&N�W�|����&�6l�����7���P)�d<:U�wh{:y��[2����s��YF�}�YN1@�Vh[�=��841���t�l@�W���*��
lx��{l�c��
[m��a�
�?���_�����4�
�e{��M�P�|��7zx�y��M��]�k�C��O.\��g|�Y��������g_o�)N��T�����?����$6���d�LP�+��L�:��a��ZV���:�i�O��������M��9o�47y��Hv6x<O����s�W������4H�N�&����m7����[0Oi����/njX��Jk����9f��!�>w��SgZ29�Y`�1��
�L�J�:�n5���U
�����/[,������(�d.��Y�B	$���ZH�\�(3��;�}��E���8~��)��_|tr�m�>��[���7�&��
�`7�H�o?�|?yae������{6?r�(��O��������u,�e�,���X�B�P����B>����;���h51N��ip�3
�$48x�������S��O�H�'��V�������w��8�����R�
�"pB��[�O��P������s��)y����l�I�
��Z�i�Z*��j
�L������`9�)��r��ck9���� ���Y���]c0V��3���5���������XA���`tH+���s�NpjqB�l�Z�l���y����5v�����������|U�c����WurE����7m�n1���s����,x�/Y�_���#��u������*-nl��lYr9}��J��/���Tv��h0�v��4Wg�0iM.�����m6(�3����|e���[e(��C����@����2�)��2����2<Ce3�M$;5�x���<=��+��1�� �p���yC�S�Ny��g[<S�M�pf��N'3����*������7�PCy����s��)*���O��5[�{o]��a[Mr������x���T�
�h���UL;�����+�N�_�����H~x����v�L���2�,���D�D� �������)e�
���py����|���gf���G���%?0�v3(@+}�4j��X%��T�F��j	��d�5�F��������F,3b��.���!�GXn�x������)e�A;��B��t�f���n}���������qSNP��rs��dH������oIN�_W-'�W��`fB^���Se��e��j��������U�K�8O��Mjd�8�������q�X��C&����w	^����mc�;	���3�8�]5�1����)2��M���R5��jT��m���'>�475�%�0�^����p�'+���,~'�~�3�����bf�&��v�" ���b�E�N����P��:.C���^���U�yS'x��)�[�$���+w���U0+�����w�l�x,��C���W����
v�h�Ex.p�B���i�����eU8�+�s<s� P�-���c�
��7�
v�R�U����)��0��:�����Q�����c�h/�T!�@!�F�b��C�����A�����k�	���<C#-eU-�w��d��e,s�{�){h���<�q���;��v3�?����',��4fJ{V�)�y��^\qj�;��L�P�:G?���d�f)�|s�M`3s��H�O[�v ���3���/B(�E8X�E�Z�=E�-B}�$?�z�F=2��e
��K��R,�*�N�\Q�����}��w,{z����j�[��������nk��DY�}�o�?���}�}���K}������{�u��f�u�%�&�
��.(��g����u��4� ����,�eq�E��,��c������HY��b�)����0�
��K�^9OJ!�1��`O5U��R��<fd@6����z�X�Z@��M��}��{��c��[���b��ZGv��"rd�^�}��w���������?�����y��;�����
TY�|���7�������|�Y�:G��,�q����)�S(�4t�Mo���M����&�d��y.��pg�A�N����w�S�h�������l���#���S:�tr�������Y;�h��{�<x���/����g����*���tv�|�����O�������04{2a��C��U��
�a4j�����W�&�at��a+5��e5���A�����=�A��m4zM���&j.hT

�R�c�S��R��������&n0J�c�A���i:PA�"�E�%����	|�����+�lur{�� �I���z?az��
f��b�Ax6�P�6 �EW8�v�P2�e��;�j�
����
0�R��3B}!��A�U��
EPa�
c*��Txa��
������-�I�9	H��c�
�'N�`�#G�df^zE���g�"���o�Q�����[�1�Y [��e��L.�����;����8�<����+}��\9��v�:��Ti����t���[���#ES��G6{�z��uS�=4���~2[TYk�;=T*�vi]���N)��g�Z�4P�b
�Y���0S��]�u��\hwa���.p���.W|z�T�8�sZ�t@U�J����~�#JI=��`���6����7����;��� ��@�}S��C����5��+������3���k��=u����	9�����Tf���q�6#K���1�@��4?x�b&9�K��8K�	��-�0��oZ��9�����~����_�\N���l����R������E����T:`��}��[���b��K�8��S�X`E�)+~n���+�h�V|��������&�r������;;��G����������R+�#�������/Xq��o[�%�������"�?�<�9�6"s4��V�F�V,��� �����o,����7j��+��E��������`V���lO���5��L����:u]ya�OV����t��?O��M�J.������T�N���e���+O�'9?����f����d!�ef��%;C�d��dq:4�-��:K�e�e���E��x-�,G-�,g-�-��Z,�,�L�u�R�Q�g-�e��oBQ
o)��Zh�����Mk���t���Fd?.����S.�
�tGB*�6�C�#'~��-��N-vV��-}�������nQ����>���i�d�$����Np��Q��U���. Wo5������C��k��u���v7�������97���;�x��n��P3�x�'n|����&7�tc�mn�������^7��	��������Mn���S	I���K�����	"�vX�L2�<��D/�j#B��� =���U����en,u#��SvS�������H��L��!
���=��rAEbU>�:rN�bo�b��t���������}]7���gX�������m����M������b���[��!�@����?�5\�����~l�oJh�:��
�F� �V*yR&�He�����S8X�����7x�n>�t�%��z&��6V���|iCW������r�:��C�q���{{������hB+5m��
����C������Ko�������9����L��D������8�N����e����F���=�F�������O����]���{�K��<�����7�
��'��4&/&�H���s��@�����qP���+qsvMN� +[����)������t������z}���^A����K�o?�TL�6���S���)�����������;��19_����C7���������}��_m�~�-��;�u��8��=x�/��w/�����Nl���/=�����~����i��Mk����}�-��vI�R
�L'w6F���1�F�R"�Y&`�P��qh�2����3j����G�TV�Q�
���^sx�����+�=�TM��iz�������~��w������s��H��`��>�}|.huJs�Y��W�j��L!`T �B��R��<��1�O�lv��P^Z&-*��p�;<�<|��n��^M������v����������{���,�s���W_{f��n��Z��^B������
c��b2A���f���8&o|�N���&�%0q�Jt(0[��b��RHa0�;������2��X)�S���pf9�Z
	&/��'/_���<w����9�V�XJ;����>y�|�l
�<b�����-�N�}�����s���G���J��U
�iW���s������NG��6!`VH�[�V����+��~Dgc�$���.�g#cL~���~�>2������?����>����[���������{�{�w>`�������{C�G�~�����_K�'�_%���'z��w$�f�j
F�N��(l�!�-#�2�n#����x�4�FL���f����� o��F�x����R������[�c�{�3���:"Q&�"���)#�QF1����7����`�����I��A� ��	R+�q�������!��V��U��j:���sl�����	�}&%���R�Z�U"��1V]�K�(����7�������X�g�z(�~�9�|:�����R
�}? ,J�3�|�
��k�W����7i���7��5k�B�5��g
�f�[�����h�?���>�}��O��,7V\^)���$Nf����������x����U{��u�=��L_���'��yH�~w|�y�W�m6?�9�f�����_o?4�1#����ox�o[t��v���W}���;.)"9'{�|�G�����\sG>��+>�*��d�A#�]���J�3t���'R�`U	�� �@A��aX�����_B���,g@�S��/�F��G����9U	y�a�[��g0�2t0,ah�_�~��E�~p30��Tu=u��Fo�E�a^g�f�l{��������N�JqD�)����>�X�Y�y8���Y}P��f���v��{����'��A�EE�p=�(�C���{�F�G�����i���ar`uf@��a4�/
s��G��6B_V�	�iXZ������4�	���#_��P�I����u�B]4 ��'��4�0�J�h[�a*3)
30�iJ�,�07�ar�{��>g���&���a��o���7���p&�P>���p���iX7��u��T�3�"�Ddc8���� ��m�G::���I|y��2��h��+����c�x0�v��s��+���C|M0Q���n+YY����`��$���+�Fhw��q��������p\����)%�iWpF>�'��Pxu0~#m�wD�D8��n����������KG:���G����'��n>������#B(�&�&��h?�����0�0�H��h�����n��xdu��_�i���>"���j?��`7����&"��E|<����^v��G��"�Dg0!�|u8����6�m���`"��+���$:���]�Kd-�h{{8.����xtQ�Xh����|<WE�"�
|[g0lK��!i�1�a>�.�Z����n~��.3�BX6��Zww8$G����]��4pW4z�4��h�_	%:�G���N|"�C�xX�P�m�j�E�h<1�\�->�L�G�����D"6��t���%��W���XI[tu�w�b��+����]"m�n�kk�k�I��_�����|u�;�������R2%=����B��*��;J���<�@D �a!B��CD! N�:!��Q�$���`
��B��]�B�#� ��n(2����K�z���E��|��6(��U#��zB7�_��.B�� �"=x(�_H�n�2BF��PS�*���w���<�q�P$W�o����v�!N�*II�2g��n��'\~�S�A���M��~��u��!���FdK� K�B:����?
"���&@��������h��������@hs Bz^���!���(��z�$�v8H�"����N�\���8|�o0��n��(�Kk)�)J����2n7tB��<�T���
-xb� �����&>�tBt
���b�QW���z�2;G��.G>��e[�����M�`c's�l�b�I�0�J��d���t�qd=:IL�G�i'��������'K��H4H�;���r�|�D�Z�#R�D�W%��h"���e%���H���H��q�+�$�d�����b�vb�Dz�(�(������B�%^�W���oX.H�M���,�H�����Nw1�	�P
���}��J[z���u.�_�����^�]VC�N�����v���D=�'��#�$�V�-�_!AZ+W��)$O��������.(!s��R��jX��R���o��P�q�J���pv����	�x
��v�
<8px$:�P!}H�����!<:�0�u�����O�_��h�K�d��j��������|������=�f~��x�?���>@�������o
�<?H�=������>M�?�?5������rh�������5��_���w����H7��j��?��]�����|�[m�o�������/������g�������S�T����~�{��d����N=��>��c���h�1���O��)T��{��?N��{DJ�3"]z�{�:���5���'��#�#������3���C�Q����NJb|�������>�W�g�w����k��i�����l�������1��g7�g7�>����������m�)����m�{B���=�=��]]a�Ak�8��A��8:eo��[�������W������b�*76�H70�tC�jz�����G���|���oq��j�b�����8����TW�������������j�.77P��/�5P
h�����&�����tQ�n�Y]J���6����(``O6���{����������ZQ�_)��U/�}�W��V�l�C�+�}�.��W+��7��y�Z1T�(�$���Q���e����k���2	�[$��[���$~!!���X�[H� $@H$������� ������?")�v7np7	yAhAHgm�����,
endstream
endobj

12 0 obj
9457
endobj

13 0 obj
<</Type/FontDescriptor/FontName/BAAAAA+LiberationSans
/Flags 4
/FontBBox[-543 -303 1301 980]/ItalicAngle 0
/Ascent 905
/Descent -211
/CapHeight 979
/StemV 80
/FontFile2 11 0 R
>>
endobj

14 0 obj
<</Length 395/Filter/FlateDecode>>
stream
x�]��n�0E��
/�E��I$���Db��J�&)R1�8��}53�����h<IQJ���-MQ^;���{h@^��y��l�&�+z7}=��(�c����:d�H���M1<���.��kh!t�&��E�Iu�/��G�D���")�����!��e���]|,�E��x� 
�5�4C�X7j�)���t����?�����������e����"S��m�-�yE������!;�U�)�5��9E�����2{�q����f�|������>Z1��fW ��'���2���������-��D���A��}���R��:k�w����a�4�[rf��5�o1c���Y��-�e���Yf��!�������.z�Q�r���Q��=����'���;��0b=�����
endstream
endobj

15 0 obj
<</Type/Font/Subtype/TrueType/BaseFont/BAAAAA+LiberationSans
/FirstChar 0
/LastChar 38
/Widths[0 666 556 556 277 556 666 556 556 277 556 666 500 333 333 556
500 222 556 500 500 556 222 556 556 556 500 277 833 556 556 556
556 556 556 556 222 556 556 ]
/FontDescriptor 13 0 R
/ToUnicode 14 0 R
>>
endobj

16 0 obj
<</F1 15 0 R
>>
endobj

17 0 obj
<<
/Font 16 0 R
/ProcSet[/PDF/Text]
>>
endobj

1 0 obj
<</Type/Page/Parent 10 0 R/Resources 17 0 R/MediaBox[0 0 841.691338582677 595.445669291339]/StructParents 0
/Contents 2 0 R>>
endobj

18 0 obj
<</Count 1/First 19 0 R/Last 19 0 R
>>
endobj

19 0 obj
<</Count 0/Title<FEFF005300680065006500740036>
/Dest[1 0 R/XYZ 0 595.445 0]/Parent 18 0 R>>
endobj

4 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[0 ]
>>
endobj

5 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[1 ]
>>
endobj

6 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[2 ]
>>
endobj

7 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[3 ]
>>
endobj

8 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[4 ]
>>
endobj

9 0 obj
<</Type/StructElem
/S/P
/P 20 0 R
/Pg 1 0 R
/K[5 ]
>>
endobj

20 0 obj
<</Type/StructTreeRoot
/ParentTree 21 0 R
/K[4 0 R  5 0 R  6 0 R  7 0 R  8 0 R  9 0 R  ]
>>
endobj

21 0 obj
<</Nums[
0 [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R ]
]>>
endobj

10 0 obj
<</Type/Pages
/Resources 17 0 R
/Kids[ 1 0 R ]
/Count 1>>
endobj

22 0 obj
<</Type/Catalog/Pages 10 0 R
/PageMode/UseOutlines
/OpenAction[1 0 R /FitBH 842]
/Outlines 18 0 R
/StructTreeRoot 20 0 R
/Lang(en-US)
/MarkInfo<</Marked true>>
>>
endobj

23 0 obj
<</Creator<FEFF00430061006C0063>
/Producer<FEFF004C0069006200720065004F00660066006900630065002000320034002E0032>
/CreationDate(D:20240901201251+02'00')>>
endobj

xref
0 24
0000000000 65535 f 
0000039032 00000 n 
0000000019 00000 n 
0000028382 00000 n 
0000039339 00000 n 
0000039409 00000 n 
0000039479 00000 n 
0000039549 00000 n 
0000039619 00000 n 
0000039689 00000 n 
0000039940 00000 n 
0000028404 00000 n 
0000037948 00000 n 
0000037970 00000 n 
0000038166 00000 n 
0000038631 00000 n 
0000038943 00000 n 
0000038976 00000 n 
0000039174 00000 n 
0000039230 00000 n 
0000039759 00000 n 
0000039868 00000 n 
0000040015 00000 n 
0000040195 00000 n 
trailer
<</Size 24/Root 22 0 R
/Info 23 0 R
/ID [ <EDE8148EB3505154EAA833007F92CF4E>
<EDE8148EB3505154EAA833007F92CF4E> ]
/DocChecksum /90374092332BC4EB6E22C6D11986C4CE
>>
startxref
40366
%%EOF
#13Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#12)
Re: scalability bottlenecks with (many) partitions (and more)

On Sun, Sep 1, 2024 at 3:30 PM Tomas Vondra <tomas@vondra.me> wrote:

I don't think that's possible with hard-coded size of the array - that
allocates the memory for everyone. We'd need to make it variable-length,
and while doing those benchmarks I think we actually already have a GUC
for that - max_locks_per_transaction tells us exactly what we need to
know, right? I mean, if I know I'll need ~1000 locks, why not to make
the fast-path array large enough for that?

I really like this idea. I'm not sure about exactly how many fast path
slots you should get for what value of max_locks_per_transaction, but
coupling the two things together in some way sounds smart.

Of course, the consequence of this would be making PGPROC variable
length, or having to point to a memory allocated separately (I prefer
the latter option, I think). I haven't done any experiments, but it
seems fairly doable - of course, not sure if it might be more expensive
compared to compile-time constants.

I agree that this is a potential problem but it sounds like the idea
works well enough that we'd probably still come out quite far ahead
even with a bit more overhead.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Tomas Vondra
tomas@vondra.me
In reply to: Robert Haas (#13)
4 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/2/24 01:53, Robert Haas wrote:

On Sun, Sep 1, 2024 at 3:30 PM Tomas Vondra <tomas@vondra.me> wrote:

I don't think that's possible with hard-coded size of the array - that
allocates the memory for everyone. We'd need to make it variable-length,
and while doing those benchmarks I think we actually already have a GUC
for that - max_locks_per_transaction tells us exactly what we need to
know, right? I mean, if I know I'll need ~1000 locks, why not to make
the fast-path array large enough for that?

I really like this idea. I'm not sure about exactly how many fast path
slots you should get for what value of max_locks_per_transaction, but
coupling the two things together in some way sounds smart.

I think we should keep that simple and make the cache large enough for
max_locks_per_transaction locks. That's the best information about
expected number of locks we have. If the GUC is left at the default
value, that probably means they backends need that many locks on
average. Yes, maybe there's an occasional spike in one of the backends,
but then that means other backends need fewer locks, and so there's less
contention for the shared lock table.

Of course, it's possible to construct counter-examples to this. Say a
single backend that needs a lot of these locks. But how's that different
from every other fixed-size cache with eviction?

The one argument to not tie this to max_locks_per_transaction is the
vastly different "per element" memory requirements. If you add one entry
to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
OTOH one fast-path entry is ~5B, give or take. That's a pretty big
difference, and it if the locks fit into the shared lock table, but
you'd like to allow more fast-path locks, having to increase
max_locks_per_transaction is not great - pretty wastefull.

OTOH I'd really hate to just add another GUC and hope the users will
magically know how to set it correctly. That's pretty unlikely, IMO. I
myself wouldn't know what a good value is, I think.

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

I think just knowing the "hit ratio" would be enough, i.e. counters for
how often it fits into the fast-path array, and how often we had to
promote it to the shared lock table would be enough, no?

Of course, the consequence of this would be making PGPROC variable
length, or having to point to a memory allocated separately (I prefer
the latter option, I think). I haven't done any experiments, but it
seems fairly doable - of course, not sure if it might be more expensive
compared to compile-time constants.

I agree that this is a potential problem but it sounds like the idea
works well enough that we'd probably still come out quite far ahead
even with a bit more overhead.

OK, I did some quick tests on this, and I don't see any regressions.

Attached are 4 patches:

1) 0001 - original patch, with some minor fixes (remove init, which is
not necessary, that sort of thing)

2) 0002 - a bit of reworks, improving comments, structuring the macros a
little bit better, etc. But still compile-time constants.

3) 0003 - dynamic sizing, based on max_locks_pet_transaction. It's a bit
ugly, because the size is calculated during shmem allocation - it
should happen earlier, but good enough for PoC.

4) 0004 - introduce a separate GUC, this is mostly to allow testing of
different values without changing max_locks_per_transaction

I've only did that on my smaller 32-core machine, but for three simple
tests it looks like this (throughput using 16 clients):

mode test master 1 2 3 4
----------------------------------------------------------------
prepared count 1460 1477 1488 1490 1491
join 15556 24451 26044 25026 24237
pgbench 148187 151192 151688 150389 152681
----------------------------------------------------------------
simple count 1341 1351 1373 1374 1370
join 4643 5439 5459 5393 5345
pgbench 139763 141267 142796 141207 142600

Those are some simple benchmarks on 100 partitions, where the regular
pgbench and count(*) are expected to not be improved, and the join is
the partitioned join this thread started with. 1-4 are the attached
patches, to see the impact for each of them.

Translated to results relative to

mode test 1 2 3 4
-------------------------------------------------
prepared count 101% 102% 102% 102%
join 157% 167% 161% 156%
pgbench 102% 102% 101% 103%
-------------------------------------------------
simple count 101% 102% 102% 102%
join 117% 118% 116% 115%
pgbench 101% 102% 101% 102%

So pretty much no difference between the patches. A bit of noise, but
that's what I'd expect on this machine.

I'll do more testing on the bit EPYC machine once it gets available, but
from these results it seems pretty promising.

regards

--
Tomas Vondra

Attachments:

v20240902-0001-v1.patchtext/x-patch; charset=UTF-8; name=v20240902-0001-v1.patchDownload
From a145f2f14fc4a995953563c5adfc72f365dbad8a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Mon, 2 Sep 2024 00:55:13 +0200
Subject: [PATCH v20240902 1/4] v1

---
 src/backend/storage/lmgr/lock.c | 97 +++++++++++++++++++++++++--------
 src/include/storage/proc.h      |  9 +--
 2 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 6dbc41dae70..78e152a0b36 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -187,20 +187,23 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 7883 + 4481) % FP_LOCK_GROUPS_PER_BACKEND)
+#define FAST_PATH_LOCK_INDEX(n)			((n) % FP_LOCK_SLOTS_PER_GROUP)
+#define FAST_PATH_LOCK_GROUP(n)			((n) / FP_LOCK_SLOTS_PER_GROUP)
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
-	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 AssertMacro((n) < FP_LOCKS_PER_BACKEND), \
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +929,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +1973,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2067,14 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2642,21 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
+	uint32		i;
 	uint32		f;
-	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		unused_slot = FP_LOCKS_PER_BACKEND;
+
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2650,11 +2668,11 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	}
 
 	/* If no existing entry, use any empty slot. */
-	if (unused_slot < FP_LOCK_SLOTS_PER_BACKEND)
+	if (unused_slot < FP_LOCKS_PER_BACKEND)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2688,21 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
+	uint32		i;
 	uint32		f;
 	bool		result = false;
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2712,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2703,7 +2730,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 {
 	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		i;
+	uint32		i, j, group;
 
 	/*
 	 * Every PGPROC that can potentially hold a fast-path lock is present in
@@ -2739,10 +2766,18 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
 
+			f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+
+			Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
 				continue;
@@ -2793,14 +2828,22 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		f, i;
+
+	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
 
+		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+
+		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
 			continue;
@@ -2904,6 +2947,10 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	int			count = 0;
 	int			fast_count = 0;
 
+	int			group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
@@ -2940,7 +2987,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	 */
 	if (ConflictsWithRelationFastPath(locktag, lockmode))
 	{
-		int			i;
+		int			i, j;
 		Oid			relid = locktag->locktag_field2;
 		VirtualTransactionId vxid;
 
@@ -2979,10 +3026,14 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
 
+				f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+
+				Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
 					continue;
@@ -3642,7 +3693,7 @@ GetLockStatusData(void)
 
 		LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+		for (f = 0; f < FP_LOCKS_PER_BACKEND; ++f)
 		{
 			LockInstanceData *instance;
 			uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..f074266a48c 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16		/* don't change */
+#define		FP_LOCKS_PER_BACKEND		(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,8 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND];		/* lock modes held for each fast-path slot */
+	Oid			fpRelId[FP_LOCKS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

v20240902-0002-rework.patchtext/x-patch; charset=UTF-8; name=v20240902-0002-rework.patchDownload
From 16abf609b3b988e27abc1cf4c48e5949e29cd344 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Mon, 2 Sep 2024 15:37:41 +0200
Subject: [PATCH v20240902 2/4] rework

---
 src/backend/storage/lmgr/lock.c | 125 +++++++++++++++++++++++---------
 src/include/storage/proc.h      |   4 +-
 2 files changed, 92 insertions(+), 37 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 78e152a0b36..524aee863fd 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -184,19 +184,49 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ *
+ * XXX Maybe it'd be easier / cheaper to just do this in 32-bits? If we
+ * did (rel % 100000) or something like that first, that'd be enough to
+ * not wrap around. But even if it wrapped, would that be a problem?
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
-#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 7883 + 4481) % FP_LOCK_GROUPS_PER_BACKEND)
-#define FAST_PATH_LOCK_INDEX(n)			((n) % FP_LOCK_SLOTS_PER_GROUP)
-#define FAST_PATH_LOCK_GROUP(n)			((n) / FP_LOCK_SLOTS_PER_GROUP)
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
 	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
-	 AssertMacro((n) < FP_LOCKS_PER_BACKEND), \
+	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
 	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
 	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
@@ -2642,20 +2672,25 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		i;
-	uint32		f;
-	uint32		unused_slot = FP_LOCKS_PER_BACKEND;
+	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
 
-	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
-		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
-		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
@@ -2668,7 +2703,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	}
 
 	/* If no existing entry, use any empty slot. */
-	if (unused_slot < FP_LOCKS_PER_BACKEND)
+	if (unused_slot < FP_LOCK_SLOTS_PER_BACKEND)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
@@ -2688,20 +2723,25 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		i;
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
 
-	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	FastPathLocalUseCounts[group] = 0;
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
-		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
-		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
@@ -2730,7 +2770,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 {
 	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		i, j, group;
+	uint32		i;
 
 	/*
 	 * Every PGPROC that can potentially hold a fast-path lock is present in
@@ -2741,7 +2781,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2766,17 +2807,21 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
+		/* Which FP group does the lock belong to? */
 		group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-		Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
 
-			f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
 
-			Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+			/* must not overflow the array of all locks for a backend */
+			Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2828,21 +2873,26 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f, i;
+	uint32		i,
+				group;
 
-	int			group = FAST_PATH_LOCK_REL_GROUP(relid);
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
 
-		f = group * FP_LOCK_SLOTS_PER_GROUP + i;
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
-		Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2946,10 +2996,12 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
 
-	int			group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2987,7 +3039,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	 */
 	if (ConflictsWithRelationFastPath(locktag, lockmode))
 	{
-		int			i, j;
+		int			i;
 		Oid			relid = locktag->locktag_field2;
 		VirtualTransactionId vxid;
 
@@ -3004,7 +3056,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -3029,10 +3081,13 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
 
-				f = group * FP_LOCK_SLOTS_PER_GROUP + j;
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
 
-				Assert(f >= 0 && f < FP_LOCKS_PER_BACKEND);
+				/* must not overflow the array of all locks for a backend */
+				Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
@@ -3693,7 +3748,7 @@ GetLockStatusData(void)
 
 		LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
 
-		for (f = 0; f < FP_LOCKS_PER_BACKEND; ++f)
+		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
 		{
 			LockInstanceData *instance;
 			uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f074266a48c..d988cfce99e 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -85,7 +85,7 @@ struct XidCache
  */
 #define		FP_LOCK_GROUPS_PER_BACKEND	64
 #define		FP_LOCK_SLOTS_PER_GROUP		16		/* don't change */
-#define		FP_LOCKS_PER_BACKEND		(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -294,7 +294,7 @@ struct PGPROC
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
 	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND];		/* lock modes held for each fast-path slot */
-	Oid			fpRelId[FP_LOCKS_PER_BACKEND]; /* slots for rel oids */
+	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

v20240902-0003-drive-this-by-max_locks_per_transaction.patchtext/x-patch; charset=UTF-8; name=v20240902-0003-drive-this-by-max_locks_per_transaction.patchDownload
From 4394b23800e1f0d96d571fac116c29ef0cf6d94a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Mon, 2 Sep 2024 01:56:17 +0200
Subject: [PATCH v20240902 3/4] drive this by max_locks_per_transaction

---
 src/backend/storage/lmgr/lock.c | 34 ++++++++++++++++++-------
 src/backend/storage/lmgr/proc.c | 45 +++++++++++++++++++++++++++++++++
 src/include/storage/proc.h      | 10 +++++---
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 524aee863fd..14124875bf9 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory.
+ * After that it remains constant.
+ *
+ * XXX Right now this is sized based on max_locks_per_transaction GUC.
+ * We try to fit the expected number of locks into the cache, with some
+ * upper limit as a safety.
+ */
+int FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -198,7 +214,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * did (rel % 100000) or something like that first, that'd be enough to
  * not wrap around. But even if it wrapped, would that be a problem?
  */
-#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /*
  * Given a lock index (into the per-backend array), calculated using the
@@ -213,7 +229,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2100,7 +2116,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group >= 0 && group < FastPathLockGroupsPerBackend);
 
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
@@ -2679,7 +2695,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2730,7 +2746,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	FastPathLocalUseCounts[group] = 0;
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2810,7 +2826,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 		/* Which FP group does the lock belong to? */
 		group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+		Assert(group < FastPathLockGroupsPerBackend);
 
 		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
@@ -2879,7 +2895,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -3001,7 +3017,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..c3d2856b151 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -113,6 +113,28 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * Calculate the number of fast-path lock groups. We allow anything
+	 * between 1 and 1024 groups, with the usual power-of-2 logic.
+	 *
+	 * XXX The 1 is the current value, 1024 is an arbitrary limit matching
+	 * max_locks_per_xact = 16k. The default is max_locks_per_xact = 64,
+	 * which means 4 groups by default.
+	 */
+	FastPathLockGroupsPerBackend = 1;
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we hit max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	elog(LOG, "FastPathLockGroupsPerBackend = %d", FastPathLockGroupsPerBackend);
+
+	size = add_size(size, mul_size(TotalProcs, FastPathLockGroupsPerBackend * (sizeof(uint64) + sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP)));
+
 	return size;
 }
 
@@ -162,6 +184,8 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *ptr,
+			   *endptr;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +235,31 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, based
+	 * on max_locks_per_transaction, so can't be included in PGPROC.
+	 */
+	ptr = ShmemAlloc(TotalProcs * (FastPathLockGroupsPerBackend * (sizeof(uint64) + sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP)));
+	endptr = ptr + (TotalProcs * (FastPathLockGroupsPerBackend * (sizeof(uint64) + sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP)));
+	MemSet(ptr, 0, TotalProcs * (FastPathLockGroupsPerBackend * (sizeof(uint64) + sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP)));
+
+	elog(LOG, "ptrlen %lu", (endptr - ptr));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays.
+		 */
+		proc->fpLockBits = (uint64 *) ptr;
+		ptr += sizeof(uint64) * FastPathLockGroupsPerBackend;
+
+		proc->fpRelId = (Oid *) ptr;
+		ptr += sizeof(Oid) * FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP;
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +321,8 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	Assert(endptr == ptr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index d988cfce99e..c9184cefccf 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int	FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16		/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,8 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND];		/* lock modes held for each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

v20240902-0004-separate-guc-to-allow-benchmarking.patchtext/x-patch; charset=UTF-8; name=v20240902-0004-separate-guc-to-allow-benchmarking.patchDownload
From 49b7bf39120535f2f258847aeb911922b7b0a192 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Mon, 2 Sep 2024 02:19:16 +0200
Subject: [PATCH v20240902 4/4] separate guc to allow benchmarking

---
 src/backend/storage/lmgr/proc.c     | 18 +++++++++---------
 src/backend/utils/misc/guc_tables.c | 10 ++++++++++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c3d2856b151..b25699a94c6 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -121,15 +121,15 @@ ProcGlobalShmemSize(void)
 	 * max_locks_per_xact = 16k. The default is max_locks_per_xact = 64,
 	 * which means 4 groups by default.
 	 */
-	FastPathLockGroupsPerBackend = 1;
-	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
-	{
-		/* stop once we hit max_locks_per_xact */
-		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
-			break;
-
-		FastPathLockGroupsPerBackend *= 2;
-	}
+//	FastPathLockGroupsPerBackend = 1;
+//	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+//	{
+//		/* stop once we hit max_locks_per_xact */
+//		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+//			break;
+//
+//		FastPathLockGroupsPerBackend *= 2;
+//	}
 
 	elog(LOG, "FastPathLockGroupsPerBackend = %d", FastPathLockGroupsPerBackend);
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 521ec5591c8..a6d4e0a8905 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2788,6 +2788,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"fastpath_lock_groups", PGC_POSTMASTER, LOCK_MANAGEMENT,
+			gettext_noop("Sets the maximum number of locks per transaction."),
+			gettext_noop("number of groups in the fast-path lock array.")
+		},
+		&FastPathLockGroupsPerBackend,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_pred_locks_per_transaction", PGC_POSTMASTER, LOCK_MANAGEMENT,
 			gettext_noop("Sets the maximum number of predicate locks per transaction."),
-- 
2.46.0

#15Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#14)
Re: scalability bottlenecks with (many) partitions (and more)

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

The one argument to not tie this to max_locks_per_transaction is the
vastly different "per element" memory requirements. If you add one entry
to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
OTOH one fast-path entry is ~5B, give or take. That's a pretty big
difference, and it if the locks fit into the shared lock table, but
you'd like to allow more fast-path locks, having to increase
max_locks_per_transaction is not great - pretty wastefull.

OTOH I'd really hate to just add another GUC and hope the users will
magically know how to set it correctly. That's pretty unlikely, IMO. I
myself wouldn't know what a good value is, I think.

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

Doing some worst case math, suppose somebody has max_connections=1000
(which is near the upper limit of what I'd consider a sane setting)
and max_locks_per_transaction=10000 (ditto). The product is 10
million, so every 10 bytes of storage each a gigabyte of RAM. Chewing
up 15GB of RAM when you could have chewed up only 0.5GB certainly
isn't too great. On the other hand, those values are kind of pushing
the limits of what is actually sane. If you imagine
max_locks_per_transaction=2000 rather than
max_locks_per_connection=10000, then it's only 3GB and that's
hopefully not a lot on the hopefully-giant machine where you're
running this.

I think just knowing the "hit ratio" would be enough, i.e. counters for
how often it fits into the fast-path array, and how often we had to
promote it to the shared lock table would be enough, no?

Yeah, probably. I mean, that won't tell you how big it needs to be,
but it will tell you whether it's big enough.

I wonder if we should be looking at further improvements in the lock
manager of some kind. For instance, imagine if we allocated storage
via DSM or DSA for cases where we need a really large number of Lock
entries. The downside of that is that we might run out of memory for
locks at runtime, which would perhaps suck, but you'd probably use
significantly less memory on average. Or, maybe we need an even bigger
rethink where we reconsider the idea that we take a separate lock for
every single partition instead of having some kind of hierarchy-aware
lock manager. I don't know. But this feels like very old, crufty tech.
There's probably something more state of the art that we could or
should be doing.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Tomas Vondra
tomas@vondra.me
In reply to: Robert Haas (#15)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/3/24 17:06, Robert Haas wrote:

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

The one argument to not tie this to max_locks_per_transaction is the
vastly different "per element" memory requirements. If you add one entry
to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
OTOH one fast-path entry is ~5B, give or take. That's a pretty big
difference, and it if the locks fit into the shared lock table, but
you'd like to allow more fast-path locks, having to increase
max_locks_per_transaction is not great - pretty wastefull.

OTOH I'd really hate to just add another GUC and hope the users will
magically know how to set it correctly. That's pretty unlikely, IMO. I
myself wouldn't know what a good value is, I think.

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

Agreed.

Doing some worst case math, suppose somebody has max_connections=1000
(which is near the upper limit of what I'd consider a sane setting)
and max_locks_per_transaction=10000 (ditto). The product is 10
million, so every 10 bytes of storage each a gigabyte of RAM. Chewing
up 15GB of RAM when you could have chewed up only 0.5GB certainly
isn't too great. On the other hand, those values are kind of pushing
the limits of what is actually sane. If you imagine
max_locks_per_transaction=2000 rather than
max_locks_per_connection=10000, then it's only 3GB and that's
hopefully not a lot on the hopefully-giant machine where you're
running this.

Yeah, although I don't quite follow the math. With 1000/10000 settings,
why would that eat 15GB of RAM? I mean, that's 1.5GB, right?

FWIW the actual cost is somewhat higher, because we seem to need ~400B
for every lock (not just the 150B for the LOCK struct). At least based
on a quick experiment. (Seems a bit high, right?).

Anyway, I agree this might be acceptable. If your transactions use this
many locks regularly, you probably need this setting anyway. If you only
need this many locks occasionally (so that you can keep the locks/xact
value low), it probably does not matter that much.

And if you're running massively-partitioned table on a tiny box, well, I
don't really think that's a particularly sane idea.

So I think I'm OK with just tying this to max_locks_per_transaction.

I think just knowing the "hit ratio" would be enough, i.e. counters for
how often it fits into the fast-path array, and how often we had to
promote it to the shared lock table would be enough, no?

Yeah, probably. I mean, that won't tell you how big it needs to be,
but it will tell you whether it's big enough.

True, but that applies to all "cache hit ratio" metrics (like for our
shared buffers). It'd be great to have something better, enough to tell
you how large the cache needs to be. But we don't :-(

I wonder if we should be looking at further improvements in the lock
manager of some kind. For instance, imagine if we allocated storage
via DSM or DSA for cases where we need a really large number of Lock
entries. The downside of that is that we might run out of memory for
locks at runtime, which would perhaps suck, but you'd probably use
significantly less memory on average. Or, maybe we need an even bigger
rethink where we reconsider the idea that we take a separate lock for
every single partition instead of having some kind of hierarchy-aware
lock manager. I don't know. But this feels like very old, crufty tech.
There's probably something more state of the art that we could or
should be doing.

Perhaps. I agree we'll probably need something more radical soon, not
just changes that aim to fix some rare exceptional case (which may be
annoying, but not particularly harmful for the complete workload).

For example, if we did what you propose, that might help when very few
transactions need a lot of locks. I don't mind saving memory in that
case, ofc. but is it a problem if those rare cases are a bit slower?
Shouldn't we focus more on cases where many locks are common? Because
people are simply going to use partitioning, a lot of indexes, etc?

So yeah, I agree we probably need a more fundamental rethink. I don't
think we can just keep optimizing the current approach, there's a limit
of fast it can be. Whether it's not locking individual partitions, or
not locking some indexes, ... I don't know.

regards

--
Tomas Vondra

#17Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#16)
Re: scalability bottlenecks with (many) partitions (and more)

Hi Tomas!

On Tue, Sep 3, 2024 at 6:20 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/3/24 17:06, Robert Haas wrote:

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

The one argument to not tie this to max_locks_per_transaction is the
vastly different "per element" memory requirements. If you add one entry
to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
OTOH one fast-path entry is ~5B, give or take. That's a pretty big
difference, and it if the locks fit into the shared lock table, but
you'd like to allow more fast-path locks, having to increase
max_locks_per_transaction is not great - pretty wastefull.

OTOH I'd really hate to just add another GUC and hope the users will
magically know how to set it correctly. That's pretty unlikely, IMO. I
myself wouldn't know what a good value is, I think.

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

Agreed.

[..]

So I think I'm OK with just tying this to max_locks_per_transaction.

If that matters then the SLRU configurability effort added 7 GUCs
(with 3 scaling up based on shared_buffers) just to give high-end
users some relief, so here 1 new shouldn't be that such a deal. We
could add to the LWLock/lock_manager wait event docs to recommend just
using known-to-be-good certain values from this $thread (or ask the
user to benchmark it himself).

I think just knowing the "hit ratio" would be enough, i.e. counters for
how often it fits into the fast-path array, and how often we had to
promote it to the shared lock table would be enough, no?

Yeah, probably. I mean, that won't tell you how big it needs to be,
but it will tell you whether it's big enough.

True, but that applies to all "cache hit ratio" metrics (like for our
shared buffers). It'd be great to have something better, enough to tell
you how large the cache needs to be. But we don't :-(

My $0.02 cents: the originating case that triggered those patches,
actually started with LWLock/lock_manager waits being the top#1. The
operator can cross check (join) that with a group by pg_locks.fastpath
(='f'), count(*). So, IMHO we have good observability in this case
(rare thing to say!)

I wonder if we should be looking at further improvements in the lock
manager of some kind. [..]

Perhaps. I agree we'll probably need something more radical soon, not
just changes that aim to fix some rare exceptional case (which may be
annoying, but not particularly harmful for the complete workload).

For example, if we did what you propose, that might help when very few
transactions need a lot of locks. I don't mind saving memory in that
case, ofc. but is it a problem if those rare cases are a bit slower?
Shouldn't we focus more on cases where many locks are common? Because
people are simply going to use partitioning, a lot of indexes, etc?

So yeah, I agree we probably need a more fundamental rethink. I don't
think we can just keep optimizing the current approach, there's a limit
of fast it can be.

Please help me understand: so are You both discussing potential far
future further improvements instead of this one ? My question is
really about: is the patchset good enough or are you considering some
other new effort instead?

BTW some other random questions:
Q1. I've been lurking into
https://github.com/tvondra/pg-lock-scalability-results and those
shouldn't be used anymore for further discussions, as they contained
earlier patches (including
0003-Add-a-memory-pool-with-adaptive-rebalancing.patch) and they were
replaced by benchmark data in this $thread, right?
Q2. Earlier attempts did contain a mempool patch to get those nice
numbers (or was that jemalloc or glibc tuning). So were those recent
results in [1]/messages/by-id/b8c43eda-0c3f-4cb4-809b-841fa5c40ada@vondra.me collected with still 0003 or you have switched
completely to glibc/jemalloc tuning?

-J.

[1]: /messages/by-id/b8c43eda-0c3f-4cb4-809b-841fa5c40ada@vondra.me

#18Tomas Vondra
tomas@vondra.me
In reply to: Jakub Wartak (#17)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/4/24 11:29, Jakub Wartak wrote:

Hi Tomas!

On Tue, Sep 3, 2024 at 6:20 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/3/24 17:06, Robert Haas wrote:

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

The one argument to not tie this to max_locks_per_transaction is the
vastly different "per element" memory requirements. If you add one entry
to max_locks_per_transaction, that adds LOCK which is a whopping 152B.
OTOH one fast-path entry is ~5B, give or take. That's a pretty big
difference, and it if the locks fit into the shared lock table, but
you'd like to allow more fast-path locks, having to increase
max_locks_per_transaction is not great - pretty wastefull.

OTOH I'd really hate to just add another GUC and hope the users will
magically know how to set it correctly. That's pretty unlikely, IMO. I
myself wouldn't know what a good value is, I think.

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

Agreed.

[..]

So I think I'm OK with just tying this to max_locks_per_transaction.

If that matters then the SLRU configurability effort added 7 GUCs
(with 3 scaling up based on shared_buffers) just to give high-end
users some relief, so here 1 new shouldn't be that such a deal. We
could add to the LWLock/lock_manager wait event docs to recommend just
using known-to-be-good certain values from this $thread (or ask the
user to benchmark it himself).

TBH I'm skeptical we'll be able to tune those GUCs. Maybe it was the
right thing for the SLRU thread, I don't know - I haven't been following
that very closely. But my impression is that we often add a GUC when
we're not quite sure how to pick a good value. So we just shift the
responsibility to someone else, who however also doesn't know.

I'd very much prefer not to do that here. Of course, it's challenging
because we can't easily resize these arrays, so even if we had some nice
heuristics to calculate the "optimal" number of fast-path slots, what
would we do with it ...

I think just knowing the "hit ratio" would be enough, i.e. counters for
how often it fits into the fast-path array, and how often we had to
promote it to the shared lock table would be enough, no?

Yeah, probably. I mean, that won't tell you how big it needs to be,
but it will tell you whether it's big enough.

True, but that applies to all "cache hit ratio" metrics (like for our
shared buffers). It'd be great to have something better, enough to tell
you how large the cache needs to be. But we don't :-(

My $0.02 cents: the originating case that triggered those patches,
actually started with LWLock/lock_manager waits being the top#1. The
operator can cross check (join) that with a group by pg_locks.fastpath
(='f'), count(*). So, IMHO we have good observability in this case
(rare thing to say!)

That's a good point. So if you had to give some instructions to users
what to measure / monitor, and how to adjust the GUC based on that, what
would your instructions be?

I wonder if we should be looking at further improvements in the lock
manager of some kind. [..]

Perhaps. I agree we'll probably need something more radical soon, not
just changes that aim to fix some rare exceptional case (which may be
annoying, but not particularly harmful for the complete workload).

For example, if we did what you propose, that might help when very few
transactions need a lot of locks. I don't mind saving memory in that
case, ofc. but is it a problem if those rare cases are a bit slower?
Shouldn't we focus more on cases where many locks are common? Because
people are simply going to use partitioning, a lot of indexes, etc?

So yeah, I agree we probably need a more fundamental rethink. I don't
think we can just keep optimizing the current approach, there's a limit
of fast it can be.

Please help me understand: so are You both discussing potential far
future further improvements instead of this one ? My question is
really about: is the patchset good enough or are you considering some
other new effort instead?

I think it was mostly a brainstorming about alternative / additional
improvements in locking. The proposed patch does not change the locking
in any fundamental way, it merely optimizes one piece - we still acquire
exactly the same set of locks, exactly the same way.

AFAICS there's an agreement the current approach has limits, and with
the growing number of partitions we're hitting them already. That may
need rethinking the fundamental approach, but I think that should not
block improvements to the current approach.

Not to mention there's no proposal for such "fundamental rework" yet.

BTW some other random questions:
Q1. I've been lurking into
https://github.com/tvondra/pg-lock-scalability-results and those
shouldn't be used anymore for further discussions, as they contained
earlier patches (including
0003-Add-a-memory-pool-with-adaptive-rebalancing.patch) and they were
replaced by benchmark data in this $thread, right?

The github results are still valid, I've only shared them 3 days ago. It
does test both the mempool and glibc tuning, to assess (and compare) the
benefits of that, but why would that make it obsolete?

By "results in this thread" I suppose you mean the couple numbers I
shared on September 2? Those were just very limited benchmarks to asses
if making the arrays variable-length (based on GUC) would make things
slower. And it doesn't, so the "full" github results still apply.

Q2. Earlier attempts did contain a mempool patch to get those nice
numbers (or was that jemalloc or glibc tuning). So were those recent
results in [1] collected with still 0003 or you have switched
completely to glibc/jemalloc tuning?

The results pushed to github are all with glibc, and test four cases:

a) mempool patch not applied, no glibc tuning
b) mempool patch applied, no glibc tuning
c) mempool patch not applied, glibc tuning
d) mempool patch applied, glibc tuning

These are the 4 "column groups" in some of the pivot tables, to allow
comparing those cases. My interpretation of the results are

1) The mempool / glibc tuning have significant benefits, at least for
some workloads (where the locking patch alone does help much).

2) There's very little difference between the mempool / glibc tuning.
The mempool does seem to have a small advantage.

3) The mempool / glibc tuning is irrelevant for non-glibc systems (e.g.
for FreeBSD which I think uses jemalloc or something like that).

I think the mempool might be interesting and useful for other reasons
(e.g. I initially wrote it to enforce a per-backend memory limit), but
you can get mostly the same caching benefits by tuning the glibc parameters.

So I'm focusing on the locking stuff.

regards

--
Tomas Vondra

#19Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tomas Vondra (#16)
Re: scalability bottlenecks with (many) partitions (and more)

On Tue, 3 Sept 2024 at 18:20, Tomas Vondra <tomas@vondra.me> wrote:

FWIW the actual cost is somewhat higher, because we seem to need ~400B
for every lock (not just the 150B for the LOCK struct).

We do indeed allocate two PROCLOCKs for every LOCK, and allocate those
inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in
dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash
buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being
used[^1]. Does that align with your usage numbers, or are they
significantly larger?

At least based on a quick experiment. (Seems a bit high, right?).

Yeah, that does seem high, thanks for nerd-sniping me.

The 152 bytes of LOCK are mostly due to a combination of two
MAX_LOCKMODES-sized int[]s that are used to keep track of the number
of requested/granted locks of each level. As MAX_LOCKMODES = 10, these
arrays use a total of 2*4*10=80 bytes, with the remaining 72 spent on
tracking. MAX_BACKENDS sadly doesn't fit in int16, so we'll have to
keep using int[]s, but that doesn't mean we can't improve this size:

ISTM that MAX_LOCKMODES is 2 larger than it has to be: LOCKMODE=0 is
NoLock, which is never used or counted in these shared structures, and
the max lock mode supported by any of the supported lock methods is
AccessExclusiveLock (8). We can thus reduce MAX_LOCKMODES to 8,
reducing size of the LOCK struct by 16 bytes.

If some struct- and field packing is OK, then we could further reduce
the size of LOCK by an additional 8 bytes by resizing the LOCKMASK
type from int to int16 (we only use the first MaxLockMode (8) + 1
bits), and then storing the grant/waitMask fields (now 4 bytes total)
in the padding that's present at the end of the waitProcs struct. This
would depend on dclist not writing in its padding space, but I
couldn't find any user that did so, and most critically dclist_init
doesn't scribble in the padding with memset.

If these are both implemented, it would save 24 bytes, reducing the
struct to 128 bytes. :) [^2]

I also checked PROCLOCK: If it is worth further packing the struct, we
should probably look at whether it's worth replacing the PGPROC* typed
fields with ProcNumber -based ones, potentially in both PROCLOCK and
PROCLOCKTAG. When combined with int16-typed LOCKMASKs, either one of
these fields being replaced with ProcNumber would allow a reduction in
size by one MAXALIGN quantum, reducing the struct to 56 bytes, the
smallest I could get it to without ignoring type alignments.

Further shmem savings can be achieved by reducing the "10% safety
margin" added at the end of LockShmemSize, as I'm fairly sure the
memory used in shared hashmaps doesn't exceed the estimated amount,
and if it did then we should probably fix that part, rather than
requesting that (up to) 10% overhead here.

Alltogether that'd save 40 bytes/lock entry on size, and ~35
bytes/lock on "safety margin", for a saving of (up to) 19% of our
current allocation. I'm not sure if these tricks would benefit with
performance or even be a demerit, apart from smaller structs usually
being better at fitting better in CPU caches.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[^1] NLOCKENTS() benefits from being a power of 2, or slightly below
one, as it's rounded up to a power of 2 when dynahash decides its
number of buckets to allocate.
[^2] Sadly this 2-cachelines alignment is lost due to dynahash's
HASHELEMENT prefix of elements. :(

#20David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#15)
Re: scalability bottlenecks with (many) partitions (and more)

On Wed, 4 Sept 2024 at 03:06, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

I think driving the array size from max_locks_per_transaction is a
good idea (rounded up to the next multiple of 16?). If someone comes
along one day and shows us a compelling case where some backend needs
more than its fair share of locks and performance is bad because of
that, then maybe we can consider adding a GUC then. Certainly, it's
much easier to add a GUC later if someone convinces us that it's a
good idea than it is to add it now and try to take it away in the
future if we realise it's not useful enough to keep.

David

#21Tomas Vondra
tomas@vondra.me
In reply to: Matthias van de Meent (#19)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/4/24 16:25, Matthias van de Meent wrote:

On Tue, 3 Sept 2024 at 18:20, Tomas Vondra <tomas@vondra.me> wrote:

FWIW the actual cost is somewhat higher, because we seem to need ~400B
for every lock (not just the 150B for the LOCK struct).

We do indeed allocate two PROCLOCKs for every LOCK, and allocate those
inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in
dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash
buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being
used[^1]. Does that align with your usage numbers, or are they
significantly larger?

I see more like ~470B per lock. If I patch CalculateShmemSize to log the
shmem allocated, I get this:

max_connections=100 max_locks_per_transaction=1000 => 194264001
max_connections=100 max_locks_per_transaction=2000 => 241756967

and (((241756967-194264001)/100/1000)) = 474

Could be alignment of structs or something, not sure.

At least based on a quick experiment. (Seems a bit high, right?).

Yeah, that does seem high, thanks for nerd-sniping me.

The 152 bytes of LOCK are mostly due to a combination of two
MAX_LOCKMODES-sized int[]s that are used to keep track of the number
of requested/granted locks of each level. As MAX_LOCKMODES = 10, these
arrays use a total of 2*4*10=80 bytes, with the remaining 72 spent on
tracking. MAX_BACKENDS sadly doesn't fit in int16, so we'll have to
keep using int[]s, but that doesn't mean we can't improve this size:

ISTM that MAX_LOCKMODES is 2 larger than it has to be: LOCKMODE=0 is
NoLock, which is never used or counted in these shared structures, and
the max lock mode supported by any of the supported lock methods is
AccessExclusiveLock (8). We can thus reduce MAX_LOCKMODES to 8,
reducing size of the LOCK struct by 16 bytes.

If some struct- and field packing is OK, then we could further reduce
the size of LOCK by an additional 8 bytes by resizing the LOCKMASK
type from int to int16 (we only use the first MaxLockMode (8) + 1
bits), and then storing the grant/waitMask fields (now 4 bytes total)
in the padding that's present at the end of the waitProcs struct. This
would depend on dclist not writing in its padding space, but I
couldn't find any user that did so, and most critically dclist_init
doesn't scribble in the padding with memset.

If these are both implemented, it would save 24 bytes, reducing the
struct to 128 bytes. :) [^2]

I also checked PROCLOCK: If it is worth further packing the struct, we
should probably look at whether it's worth replacing the PGPROC* typed
fields with ProcNumber -based ones, potentially in both PROCLOCK and
PROCLOCKTAG. When combined with int16-typed LOCKMASKs, either one of
these fields being replaced with ProcNumber would allow a reduction in
size by one MAXALIGN quantum, reducing the struct to 56 bytes, the
smallest I could get it to without ignoring type alignments.

Further shmem savings can be achieved by reducing the "10% safety
margin" added at the end of LockShmemSize, as I'm fairly sure the
memory used in shared hashmaps doesn't exceed the estimated amount,
and if it did then we should probably fix that part, rather than
requesting that (up to) 10% overhead here.

Alltogether that'd save 40 bytes/lock entry on size, and ~35
bytes/lock on "safety margin", for a saving of (up to) 19% of our
current allocation. I'm not sure if these tricks would benefit with
performance or even be a demerit, apart from smaller structs usually
being better at fitting better in CPU caches.

Not sure either, but it seems worth exploring. If you do an experimental
patch for the LOCK size reduction, I can get some numbers.

I'm not sure about the safety margins. 10% sure seems like quite a bit
of memory (it might not have in the past, but as the instances are
growing, that probably changed).

regards

--
Tomas Vondra

#22Tomas Vondra
tomas@vondra.me
In reply to: David Rowley (#20)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/4/24 17:12, David Rowley wrote:

On Wed, 4 Sept 2024 at 03:06, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 2, 2024 at 1:46 PM Tomas Vondra <tomas@vondra.me> wrote:

But say we add a GUC and set it to -1 by default, in which case it just
inherits the max_locks_per_transaction value. And then also provide some
basic metric about this fast-path cache, so that people can tune this?

All things being equal, I would prefer not to add another GUC for
this, but we might need it.

I think driving the array size from max_locks_per_transaction is a
good idea (rounded up to the next multiple of 16?).

Maybe, although I was thinking we'd just use the regular doubling, to
get nice "round" numbers. It will likely overshoot a little bit (unless
people set the GUC to exactly 2^N), but I don't think that's a problem.

If someone comes along one day and shows us a compelling case where
some backend needs more than its fair share of locks and performance
is bad because of that, then maybe we can consider adding a GUC then.
Certainly, it's much easier to add a GUC later if someone convinces
us that it's a good idea than it is to add it now and try to take it
away in the future if we realise it's not useful enough to keep.

Yeah, I agree with this.

regards

--
Tomas Vondra

#23Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#16)
Re: scalability bottlenecks with (many) partitions (and more)

On Tue, Sep 3, 2024 at 12:19 PM Tomas Vondra <tomas@vondra.me> wrote:

Doing some worst case math, suppose somebody has max_connections=1000
(which is near the upper limit of what I'd consider a sane setting)
and max_locks_per_transaction=10000 (ditto). The product is 10
million, so every 10 bytes of storage each a gigabyte of RAM. Chewing
up 15GB of RAM when you could have chewed up only 0.5GB certainly
isn't too great. On the other hand, those values are kind of pushing
the limits of what is actually sane. If you imagine
max_locks_per_transaction=2000 rather than
max_locks_per_connection=10000, then it's only 3GB and that's
hopefully not a lot on the hopefully-giant machine where you're
running this.

Yeah, although I don't quite follow the math. With 1000/10000 settings,
why would that eat 15GB of RAM? I mean, that's 1.5GB, right?

Oh, right.

FWIW the actual cost is somewhat higher, because we seem to need ~400B
for every lock (not just the 150B for the LOCK struct). At least based
on a quick experiment. (Seems a bit high, right?).

Hmm, yes, that's unpleasant.

Perhaps. I agree we'll probably need something more radical soon, not
just changes that aim to fix some rare exceptional case (which may be
annoying, but not particularly harmful for the complete workload).

For example, if we did what you propose, that might help when very few
transactions need a lot of locks. I don't mind saving memory in that
case, ofc. but is it a problem if those rare cases are a bit slower?
Shouldn't we focus more on cases where many locks are common? Because
people are simply going to use partitioning, a lot of indexes, etc?

So yeah, I agree we probably need a more fundamental rethink. I don't
think we can just keep optimizing the current approach, there's a limit
of fast it can be. Whether it's not locking individual partitions, or
not locking some indexes, ... I don't know.

I don't know, either. We don't have to decide right now; it's just
something to keep in mind.

--
Robert Haas
EDB: http://www.enterprisedb.com

#24Tomas Vondra
tomas@vondra.me
In reply to: Robert Haas (#23)
4 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

Here's a bit more polished version of this patch series. I only propose
0001 and 0002 for eventual commit, the two other bits are just stuff to
help with benchmarking etc.

0001
----
increases the size of the arrays, but uses hard-coded number of groups
(64, so 1024 locks) and leaves everything in PGPROC

0002
----
Allocates that separately from PGPROC, and sets the number based on
max_locks_per_transactions

I think 0001 and 0002 should be in fairly good shape, IMO. There's a
couple cosmetic things that bother me (e.g. the way it Asserts after
each FAST_PATH_LOCK_REL_GROUP seems distracting).

But other than that I think it's fine, so a review / opinions would be
very welcome.

0003
----
Adds a separate GUC to make benchmarking easier (without the impact of
changing the size of the lock table).

I think the agreement is to not have a new GUC, unless it turns out to
be necessary in the future. So 0003 was just to make benchmarking a bit
easier.

0004
----
This was a quick attempt to track the fraction of fast-path locks, and
adding the infrastructure is mostly mechanical thing. But it turns out
it's not quite trivial to track why a lock did not use fast-path. It
might have been because it wouldn't fit, or maybe it's not eligible, or
maybe there's a stronger lock. It's not obvious how to count these to
help with evaluating the number of fast-path slots.

regards

--
Tomas Vondra

Attachments:

v20240905-0001-Increase-the-number-of-fast-path-lock-slot.patchtext/x-patch; charset=UTF-8; name=v20240905-0001-Increase-the-number-of-fast-path-lock-slot.patchDownload
From 6877dfa7cd94c9f541689d9fe211bdcfaf8bbbdc Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Mon, 2 Sep 2024 00:55:13 +0200
Subject: [PATCH v20240905 1/4] Increase the number of fast-path lock slots

The fast-path locking introduced in 9.2 allowed each backend to obtain
up to 16 relation locks, provided the lock is not exclusive etc. If the
backend needs to obtain more locks, it needs to put them into the lock
table in shared memory, which is considerably more expensive.

The limit of 16 entries was always rather low. We need to lock all
relations - not just tables, but also indexes. And for planning we need
to lock all relations that might be used by a query, not just those in
the final plan. So it was common to use all the fast-path slots even
with simple schemas and queries.

But as partitioning gets more widely used, with an ever increasing
number of partitions, this bottleneck is becoming easier to hit.
Especially on large machines with enough memory to keep the queried data
cached, and many cores to cause contention when accessing the shared
lock table.

This patch addresses that by increasing the number of fast-path slots
from 16 to 1024, structuring it as a 16-way set associative cache. The
cache is divided into groups of 16 slots, and each lock is mapped to
exactly one of those groups (by hashing the OID). Entries in each group
are processed by linear search etc.

We could treat the whole array as a single hash table, but that would
degrade as it gets full (the cache is in shared memory, so we can't
resize it easily to keep the load factor low). It would probably also
have worse locality, due to more random access.

If a group is full, we can simply insert the new lock into the shared
lock table. This is the same as for the original code with 16 slots. Of
course, if this happens too often, that reduces the benefit.

To map relids to groups we use trivial hash function of the form

    h(relid) = ((relid * P) mod N)

where P is a hard-coded prime number, and N is the number of groups.
This is fast and works quite well - the main purpose is to map relids to
different groups, so that we don't get "hot groups" while the rest of
the groups are almost empty. If the relids are already spread out, the
hash function is unlikely to group them. If the relids are sequential
(e.g. for tables created by a script), the multiplication will spread
them around.

Note: This hard-codes the number of groups to 64, which means 1024
fast-path locks. This shall be either configurable or even better
adjusted based on some existing GUC.
---
 src/backend/storage/lmgr/lock.c | 148 +++++++++++++++++++++++++++-----
 src/include/storage/proc.h      |   8 +-
 2 files changed, 132 insertions(+), 24 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a98f08..f41e4a33f06 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,23 +184,56 @@ static int	FastPathLocalUseCount = 0;
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ *
+ * XXX Maybe it'd be easier / cheaper to just do this in 32-bits? If we
+ * did (rel % 100000) or something like that first, that'd be enough to
+ * not wrap around. But even if it wrapped, would that be a problem?
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +959,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +2003,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2097,14 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2672,26 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2654,7 +2707,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2723,26 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2752,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2714,7 +2781,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2739,9 +2807,21 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		/* Which FP group does the lock belong to? */
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
+
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
+
+			/* must not overflow the array of all locks for a backend */
+			Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2793,13 +2873,26 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		i,
+				group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
+		/* must not overflow the array of all locks for a backend */
+		Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2903,6 +2996,12 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
+
+	/* Which FP group does the lock belong to? */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
+	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2957,7 +3056,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -2979,9 +3078,16 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
+
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
+
+				/* must not overflow the array of all locks for a backend */
+				Assert(f < FP_LOCK_SLOTS_PER_BACKEND);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..845058da9fa 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,7 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
+														 * each fast-path slot */
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
-- 
2.46.0

v20240905-0002-Size-fast-path-slots-using-max_locks_per_t.patchtext/x-patch; charset=UTF-8; name=v20240905-0002-Size-fast-path-slots-using-max_locks_per_t.patchDownload
From 9eaa679b5adea3a842eb944927d77f3d447646fe Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 5 Sep 2024 18:14:09 +0200
Subject: [PATCH v20240905 2/4] Size fast-path slots using
 max_locks_per_transaction

Instead of using a hard-coded value of 64 groups (1024 fast-path slots),
determine the value based on max_locks_per_transaction GUC. This size
is calculated startup, before allocating shared memory.

The default value of max_locks_per_transaction value is 64, which means
4 fast-path groups by default.

The max_locks_per_transaction GUC is the best information about how many
locks to expect per backend, but it's main purpose is to size the shared
lock table. It is often set to an average number of locks needed by a
backend, while some backends may need substantially more locks.

This means fast-path capacity calculated from max_locks_per_transaction
may not be sufficient for those lock-hungry backends, forcing them to
use the shared lock table. If that is a problem, the only solution is to
increase the GUC, even if the capacity of the shared lock table was
already sufficient. That is not free, because each lock in the shared
lock table requires almost 500B.

The assumption is this is not an issue. Either there are only few of
those lock-intensive backends, in which case concurrency when accessing
the shared lock table is not an issue. Or there are enough of them to
actually need a higher max_locks_per_transaction value.

It may turn out we actually need a separate GUC for fast-path locking,
but let's not add one until we're sure that's actually the case.

An alternative approach might be to size the fast-path arrays for a
multiple of max_locks_per_transaction. The cost of adding a fast-path
slot is much lower (only ~5B compared to ~500B for shared lock table),
so this would be cheaper than increasing max_locks_per_transaction. But
it's not clear what multiple of max_locks_per_transaction to use.
---
 src/backend/bootstrap/bootstrap.c   |  2 ++
 src/backend/postmaster/postmaster.c |  5 +++
 src/backend/storage/lmgr/lock.c     | 34 +++++++++++++++------
 src/backend/storage/lmgr/proc.c     | 47 +++++++++++++++++++++++++++++
 src/backend/tcop/postgres.c         |  3 ++
 src/backend/utils/init/postinit.c   | 34 +++++++++++++++++++++
 src/include/miscadmin.h             |  1 +
 src/include/storage/proc.h          | 11 ++++---
 8 files changed, 123 insertions(+), 14 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7637581a184..ed59dfce893 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
+	InitializeFastPathLocks();
+
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 96bc1d1cfed..f4a16595d7f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Also calculate the size of the fast-path lock arrays in PGPROC.
+	 */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index f41e4a33f06..134cd8a6e34 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory,
+ * and remains constant after that.
+ *
+ * We set the limit based on max_locks_per_transaction GUC, because that's
+ * the best information about expected number of locks per backend we have.
+ * See InitializeFastPathLocks for details.
+ */
+int			FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -198,7 +214,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * did (rel % 100000) or something like that first, that'd be enough to
  * not wrap around. But even if it wrapped, would that be a problem?
  */
-#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+#define FAST_PATH_LOCK_REL_GROUP(rel) 	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /*
  * Given a lock index (into the per-backend array), calculated using the
@@ -213,7 +229,7 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2100,7 +2116,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group >= 0 && group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group >= 0 && group < FastPathLockGroupsPerBackend);
 
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
@@ -2679,7 +2695,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2730,7 +2746,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	FastPathLocalUseCounts[group] = 0;
 	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
@@ -2810,7 +2826,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 		/* Which FP group does the lock belong to? */
 		group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-		Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+		Assert(group < FastPathLockGroupsPerBackend);
 
 		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
@@ -2879,7 +2895,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -3001,7 +3017,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	/* Which FP group does the lock belong to? */
 	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
-	Assert(group < FP_LOCK_GROUPS_PER_BACKEND);
+	Assert(group < FastPathLockGroupsPerBackend);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..a91b6f8a6c0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
 	Size		size = 0;
 	Size		TotalProcs =
 		add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* ProcGlobal */
 	size = add_size(size, sizeof(PROC_HDR));
@@ -113,6 +115,18 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * fast-path lock arrays
+	 *
+	 * XXX The explicit alignment may not be strictly necessary, as both
+	 * values are already multiples of 8 bytes, which is what MAXALIGN does.
+	 * But better to make that obvious.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
+
 	return size;
 }
 
@@ -162,6 +176,10 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *fpPtr,
+			   *fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +229,38 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, so
+	 * can't be included in PGPROC. We allocate a separate piece of shared
+	 * memory and then divide that between backends.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
+	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
+	/* For asserts checking we did not overflow. */
+	fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays, and move the pointer. We interleave
+		 * the two arrays, to keep at least some locality.
+		 */
+		proc->fpLockBits = (uint64 *) fpPtr;
+		fpPtr += fpLockBitsSize;
+
+		proc->fpRelId = (Oid *) fpPtr;
+		fpPtr += fpRelIdSize;
+
+		Assert(fpPtr <= fpEndPtr);
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +322,9 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	/* We expect to consume exactly the expected amount of data. */
+	Assert(fpPtr = fpEndPtr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea1135..f54ae00abca 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,6 +4166,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
+	/* Initialize size of fast-path lock cache. */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3b50ce19a2c..1faf756c8d8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,6 +557,40 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
+/*
+ * Initialize the number of fast-path lock slots in PGPROC.
+ *
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
+ *
+ * The default max_locks_per_xact=64 means 4 groups by default.
+ *
+ * We allow anything between 1 and 1024 groups, with the usual power-of-2
+ * logic. The 1 is the "old" value before allowing multiple groups, 1024
+ * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
+ * 1024 are unlikely to be beneficial - we're likely to hit other
+ * bottlenecks long before that.
+ */
+void
+InitializeFastPathLocks(void)
+{
+	Assert(FastPathLockGroupsPerBackend == 0);
+
+	/* we need at least one group */
+	FastPathLockGroupsPerBackend = 1;
+
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we exceed max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb9..e26d108a470 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
+extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 845058da9fa..0e55c166529 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,9 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
-														 * each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

v20240905-0003-separate-guc-to-allow-benchmarking.patchtext/x-patch; charset=UTF-8; name=v20240905-0003-separate-guc-to-allow-benchmarking.patchDownload
From d9f3deaa518a673e4dc8df1ff6e40f47c2637e5e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 5 Sep 2024 16:52:26 +0200
Subject: [PATCH v20240905 3/4] separate guc to allow benchmarking

---
 src/backend/bootstrap/bootstrap.c   |  2 --
 src/backend/postmaster/postmaster.c |  5 -----
 src/backend/tcop/postgres.c         |  3 ---
 src/backend/utils/init/postinit.c   | 34 -----------------------------
 src/backend/utils/misc/guc_tables.c | 10 +++++++++
 src/include/miscadmin.h             |  1 -
 6 files changed, 10 insertions(+), 45 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ed59dfce893..7637581a184 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,8 +309,6 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
-	InitializeFastPathLocks();
-
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index f4a16595d7f..96bc1d1cfed 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,11 +903,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
-	/*
-	 * Also calculate the size of the fast-path lock arrays in PGPROC.
-	 */
-	InitializeFastPathLocks();
-
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f54ae00abca..8bc6bea1135 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,9 +4166,6 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
-	/* Initialize size of fast-path lock cache. */
-	InitializeFastPathLocks();
-
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1faf756c8d8..3b50ce19a2c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,40 +557,6 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
-/*
- * Initialize the number of fast-path lock slots in PGPROC.
- *
- * This must be called after modules have had the chance to alter GUCs in
- * shared_preload_libraries and before shared memory size is determined.
- *
- * The default max_locks_per_xact=64 means 4 groups by default.
- *
- * We allow anything between 1 and 1024 groups, with the usual power-of-2
- * logic. The 1 is the "old" value before allowing multiple groups, 1024
- * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
- * 1024 are unlikely to be beneficial - we're likely to hit other
- * bottlenecks long before that.
- */
-void
-InitializeFastPathLocks(void)
-{
-	Assert(FastPathLockGroupsPerBackend == 0);
-
-	/* we need at least one group */
-	FastPathLockGroupsPerBackend = 1;
-
-	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
-	{
-		/* stop once we exceed max_locks_per_xact */
-		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
-			break;
-
-		FastPathLockGroupsPerBackend *= 2;
-	}
-
-	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
-}
-
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58b..cef6341979f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2788,6 +2788,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"fastpath_lock_groups", PGC_POSTMASTER, LOCK_MANAGEMENT,
+			gettext_noop("Sets the maximum number of locks per transaction."),
+			gettext_noop("number of groups in the fast-path lock array.")
+		},
+		&FastPathLockGroupsPerBackend,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_pred_locks_per_transaction", PGC_POSTMASTER, LOCK_MANAGEMENT,
 			gettext_noop("Sets the maximum number of predicate locks per transaction."),
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index e26d108a470..25348e71eb9 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,7 +475,6 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
-extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
-- 
2.46.0

v20240905-0004-lock-stats.patchtext/x-patch; charset=UTF-8; name=v20240905-0004-lock-stats.patchDownload
From 6fbe413d86ecb1dca6acf939ab06550290ec337b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 3 Sep 2024 19:27:16 +0200
Subject: [PATCH v20240905 4/4] lock stats

---
 src/backend/catalog/system_views.sql      |   6 +
 src/backend/storage/lmgr/lock.c           |  18 +++
 src/backend/utils/activity/Makefile       |   1 +
 src/backend/utils/activity/pgstat.c       |  19 +++
 src/backend/utils/activity/pgstat_locks.c | 134 ++++++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c       |  18 +++
 src/include/catalog/pg_proc.dat           |  13 +++
 src/include/pgstat.h                      |  21 +++-
 src/include/utils/pgstat_internal.h       |  22 ++++
 9 files changed, 251 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/utils/activity/pgstat_locks.c

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..f5aecf14365 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1134,6 +1134,12 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_locks AS
+    SELECT
+        pg_stat_get_fplocks_num_inserted() AS num_inserted,
+        pg_stat_get_fplocks_num_overflowed() AS num_overflowed,
+        pg_stat_get_fplocks_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_checkpointer AS
     SELECT
         pg_stat_get_checkpointer_num_timed() AS num_timed,
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 134cd8a6e34..ecaf64b614c 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -39,6 +39,7 @@
 #include "access/xlogutils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
@@ -964,6 +965,23 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		log_lock = true;
 	}
 
+	/*
+	 * See if an eligible lock would fit into the fast path cache or not.
+	 * This is not quite correct, for two reasons. Firstly, eligible locks
+	 * may end up requiring a regular lock because of a strong lock being
+	 * held by someone else. Secondly, the count can be a bit stale, if
+	 * some other backend promoted some of our fast-path locks.
+	 *
+	 * XXX Worth counting non-eligible locks too?
+	 */
+	if (EligibleForRelationFastPath(locktag, lockmode))
+	{
+		if (FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
+			++PendingFastPathLockStats.num_inserted;
+		else
+			++PendingFastPathLockStats.num_overflowed;
+	}
+
 	/*
 	 * Attempt to take lock via fast path, if eligible.  But if we remember
 	 * having filled up the fast path array, we don't attempt to make any
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17c..4b595f304d0 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -25,6 +25,7 @@ OBJS = \
 	pgstat_database.o \
 	pgstat_function.o \
 	pgstat_io.o \
+	pgstat_locks.o \
 	pgstat_relation.o \
 	pgstat_replslot.o \
 	pgstat_shmem.o \
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 178b5ef65aa..39475c5915f 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -81,6 +81,7 @@
  * - pgstat_database.c
  * - pgstat_function.c
  * - pgstat_io.c
+ * - pgstat_locks.c
  * - pgstat_relation.c
  * - pgstat_replslot.c
  * - pgstat_slru.c
@@ -446,6 +447,21 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
 	},
+
+	[PGSTAT_KIND_FPLOCKS] = {
+		.name = "fp-locks",
+
+		.fixed_amount = true,
+
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, fplocks),
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, fplocks),
+		.shared_data_off = offsetof(PgStatShared_FastPathLocks, stats),
+		.shared_data_len = sizeof(((PgStatShared_FastPathLocks *) 0)->stats),
+
+		.init_shmem_cb = pgstat_fplocks_init_shmem_cb,
+		.reset_all_cb = pgstat_fplocks_reset_all_cb,
+		.snapshot_cb = pgstat_fplocks_snapshot_cb,
+	},
 };
 
 /*
@@ -739,6 +755,9 @@ pgstat_report_stat(bool force)
 	/* flush SLRU stats */
 	partial_flush |= pgstat_slru_flush(nowait);
 
+	/* flush lock stats */
+	partial_flush |= pgstat_fplocks_flush(nowait);
+
 	last_flush = now;
 
 	/*
diff --git a/src/backend/utils/activity/pgstat_locks.c b/src/backend/utils/activity/pgstat_locks.c
new file mode 100644
index 00000000000..99a5d5259da
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_locks.c
@@ -0,0 +1,134 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_locks.c
+ *	  Implementation of locks statistics.
+ *
+ * This file contains the implementation of lock statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of
+ * statistics.
+ *
+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_locks.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+
+PgStat_FastPathLockStats PendingFastPathLockStats = {0};
+
+
+
+/*
+ * Do we have any locks to report?
+ */
+static bool
+pgstat_have_pending_locks(void)
+{
+	return (PendingFastPathLockStats.num_inserted > 0) ||
+		   (PendingFastPathLockStats.num_overflowed > 0);
+}
+
+
+/*
+ * If nowait is true, this function returns true if the lock could not be
+ * acquired. Otherwise return false.
+ */
+bool
+pgstat_fplocks_flush(bool nowait)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+
+	Assert(IsUnderPostmaster || !IsPostmasterEnvironment);
+	Assert(pgStatLocal.shmem != NULL &&
+		   !pgStatLocal.shmem->is_shutdown);
+
+	/*
+	 * This function can be called even if nothing at all has happened. Avoid
+	 * taking lock for nothing in that case.
+	 */
+	if (!pgstat_have_pending_locks())
+		return false;
+
+	if (!nowait)
+		LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE))
+		return true;
+
+#define FPLOCKS_ACC(fld) stats_shmem->stats.fld += PendingFastPathLockStats.fld
+	FPLOCKS_ACC(num_inserted);
+	FPLOCKS_ACC(num_overflowed);
+#undef FPLOCKS_ACC
+
+	LWLockRelease(&stats_shmem->lock);
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingFastPathLockStats, 0, sizeof(PendingFastPathLockStats));
+
+	return false;
+}
+
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * a pointer to the fast-path lock statistics struct.
+ */
+PgStat_FastPathLockStats *
+pgstat_fetch_stat_fplocks(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_FPLOCKS);
+
+	return &pgStatLocal.snapshot.fplocks;
+}
+
+void
+pgstat_fplocks_init_shmem_cb(void *stats)
+{
+	PgStatShared_FastPathLocks *stats_shmem = (PgStatShared_FastPathLocks *) stats;
+
+	LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA);
+}
+
+void
+pgstat_fplocks_reset_all_cb(TimestampTz ts)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+
+	/* see explanation above PgStatShared_FastPathLocks for the reset protocol */
+	LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	pgstat_copy_changecounted_stats(&stats_shmem->reset_offset,
+									&stats_shmem->stats,
+									sizeof(stats_shmem->stats),
+									&stats_shmem->changecount);
+	stats_shmem->stats.stat_reset_timestamp = ts;
+	LWLockRelease(&stats_shmem->lock);
+}
+
+void
+pgstat_fplocks_snapshot_cb(void)
+{
+	PgStatShared_FastPathLocks *stats_shmem = &pgStatLocal.shmem->fplocks;
+	PgStat_FastPathLockStats *reset_offset = &stats_shmem->reset_offset;
+	PgStat_FastPathLockStats reset;
+
+	pgstat_copy_changecounted_stats(&pgStatLocal.snapshot.fplocks,
+									&stats_shmem->stats,
+									sizeof(stats_shmem->stats),
+									&stats_shmem->changecount);
+
+	LWLockAcquire(&stats_shmem->lock, LW_SHARED);
+	memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
+	LWLockRelease(&stats_shmem->lock);
+
+	/* compensate by reset offsets */
+#define FPLOCKS_COMP(fld) pgStatLocal.snapshot.fplocks.fld -= reset.fld;
+	FPLOCKS_COMP(num_inserted);
+	FPLOCKS_COMP(num_overflowed);
+#undef FPLOCKS_COMP
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97dc09ac0d9..dcd4957777d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1261,6 +1261,24 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_stat_bgwriter()->buf_alloc);
 }
 
+Datum
+pg_stat_get_fplocks_num_inserted(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_fplocks()->num_inserted);
+}
+
+Datum
+pg_stat_get_fplocks_num_overflowed(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_fplocks()->num_overflowed);
+}
+
+Datum
+pg_stat_get_fplocks_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_fplocks()->stat_reset_timestamp);
+}
+
 /*
 * When adding a new column to the pg_stat_io view, add a new enum value
 * here above IO_NUM_COLUMNS.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acacf..242aea463ae 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5986,6 +5986,19 @@
   provolatile => 'v', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_subscription_stats' },
 
+{ oid => '6095', descr => 'statistics: number of acquired fast-path locks',
+  proname => 'pg_stat_get_fplocks_num_inserted', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_fplocks_num_inserted' },
+
+{ oid => '6096', descr => 'statistics: number of not acquired fast-path locks',
+  proname => 'pg_stat_get_fplocks_num_overflowed', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_fplocks_num_overflowed' },
+
+{ oid => '6097', descr => 'statistics: last reset for the fast-path locks',
+  proname => 'pg_stat_get_fplocks_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_fplocks_stat_reset_time' },
+
 { oid => '3163', descr => 'current trigger depth',
   proname => 'pg_trigger_depth', provolatile => 's', proparallel => 'r',
   prorettype => 'int4', proargtypes => '', prosrc => 'pg_trigger_depth' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a1..f66b189f8df 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -57,9 +57,10 @@
 #define PGSTAT_KIND_IO	9
 #define PGSTAT_KIND_SLRU	10
 #define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_FPLOCKS	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
-#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
+#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_FPLOCKS
 #define PGSTAT_KIND_BUILTIN_SIZE (PGSTAT_KIND_BUILTIN_MAX + 1)
 
 /* Custom stats kinds */
@@ -303,6 +304,13 @@ typedef struct PgStat_CheckpointerStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_CheckpointerStats;
 
+typedef struct PgStat_FastPathLockStats
+{
+	PgStat_Counter num_inserted;
+	PgStat_Counter num_overflowed;
+	TimestampTz stat_reset_timestamp;
+} PgStat_FastPathLockStats;
+
 
 /*
  * Types related to counting IO operations
@@ -538,6 +546,10 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern void pgstat_report_bgwriter(void);
 extern PgStat_BgWriterStats *pgstat_fetch_stat_bgwriter(void);
 
+/*
+ * Functions in pgstat_locks.c
+ */
+extern PgStat_FastPathLockStats *pgstat_fetch_stat_fplocks(void);
 
 /*
  * Functions in pgstat_checkpointer.c
@@ -811,4 +823,11 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
 
 
+/*
+ * Variables in pgstat_locks.c
+ */
+
+/* updated directly by fast-path locking */
+extern PGDLLIMPORT PgStat_FastPathLockStats PendingFastPathLockStats;
+
 #endif							/* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 25820cbf0a6..0627983846c 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -340,6 +340,15 @@ typedef struct PgStatShared_BgWriter
 	PgStat_BgWriterStats reset_offset;
 } PgStatShared_BgWriter;
 
+typedef struct PgStatShared_FastPathLocks
+{
+	/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
+	LWLock		lock;
+	uint32		changecount;
+	PgStat_FastPathLockStats stats;
+	PgStat_FastPathLockStats reset_offset;
+} PgStatShared_FastPathLocks;
+
 typedef struct PgStatShared_Checkpointer
 {
 	/* lock protects ->reset_offset as well as stats->stat_reset_timestamp */
@@ -453,6 +462,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
+	PgStatShared_FastPathLocks fplocks;
 
 	/*
 	 * Custom stats data with fixed-numbered objects, indexed by (PgStat_Kind
@@ -487,6 +497,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_WalStats wal;
 
+	PgStat_FastPathLockStats fplocks;
+
 	/*
 	 * Data in snapshot for custom fixed-numbered statistics, indexed by
 	 * (PgStat_Kind - PGSTAT_KIND_CUSTOM_MIN).  Each entry is allocated in
@@ -704,6 +716,16 @@ extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
 extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
 
 
+/*
+ * Functions in pgstat_locks.c
+ */
+
+extern bool pgstat_fplocks_flush(bool);
+extern void pgstat_fplocks_init_shmem_cb(void *stats);
+extern void pgstat_fplocks_reset_all_cb(TimestampTz ts);
+extern void pgstat_fplocks_snapshot_cb(void);
+
+
 /*
  * Variables in pgstat.c
  */
-- 
2.46.0

#25Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#18)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/4/24 13:15, Tomas Vondra wrote:

On 9/4/24 11:29, Jakub Wartak wrote:

Hi Tomas!

...

My $0.02 cents: the originating case that triggered those patches,
actually started with LWLock/lock_manager waits being the top#1. The
operator can cross check (join) that with a group by pg_locks.fastpath
(='f'), count(*). So, IMHO we have good observability in this case
(rare thing to say!)

That's a good point. So if you had to give some instructions to users
what to measure / monitor, and how to adjust the GUC based on that, what
would your instructions be?

After thinking about this a bit more, I'm actually wondering if this is
source of information is sufficient. Firstly, it's just a snapshot of a
single instance, and it's not quite trivial to get some summary for
longer time period (people would have to sample it often enough, etc.).
Doable, but much less convenient than the cumulative counters.

But for the sampling, doesn't this produce skewed data? Imagine you have
a workload with very short queries (which is when fast-path matters), so
you're likely to see the backend while it's obtaining the locks. If the
fast-path locks take much faster acquire (kinda the whole point), we're
more likely to see the backend while it's obtaining the regular locks.

Let's say the backend needs 1000 locks, and 500 of those fit into the
fast-path array. We're likely to see the 500 fast-path locks already
acquired, and a random fraction of the 500 non-fast-path locks. So in
the end you'll se backends needing 500 fast-path locks and 250 regular
locks. That doesn't seem terrible, but I guess the difference can be
made even larger.

regards

--
Tomas Vondra

#26Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#25)
Re: scalability bottlenecks with (many) partitions (and more)

On Thu, Sep 5, 2024 at 7:33 PM Tomas Vondra <tomas@vondra.me> wrote:

My $0.02 cents: the originating case that triggered those patches,
actually started with LWLock/lock_manager waits being the top#1. The
operator can cross check (join) that with a group by pg_locks.fastpath
(='f'), count(*). So, IMHO we have good observability in this case
(rare thing to say!)

That's a good point. So if you had to give some instructions to users
what to measure / monitor, and how to adjust the GUC based on that, what
would your instructions be?

After thinking about this a bit more, I'm actually wondering if this is
source of information is sufficient. Firstly, it's just a snapshot of a
single instance, and it's not quite trivial to get some summary for
longer time period (people would have to sample it often enough, etc.).
Doable, but much less convenient than the cumulative counters.

OK, so answering previous question:

Probably just monitor pg_stat_activty (group on wait events count(*))
with pg_locks with group by on per-pid and fastpath . Even simple
observations with \watch 0.1 are good enough to confirm/deny the
root-cause in practice even for short bursts while it is happening.
While deploying monitoring for a longer time (with say sample of 1s),
you sooner or later would get the __high water mark__ and possibly
allow up to that many fastpaths as starting point as there are locks
occuring for affected PIDs (or double the amount).

But for the sampling, doesn't this produce skewed data? Imagine you have
a workload with very short queries (which is when fast-path matters), so
you're likely to see the backend while it's obtaining the locks. If the
fast-path locks take much faster acquire (kinda the whole point), we're
more likely to see the backend while it's obtaining the regular locks.

Let's say the backend needs 1000 locks, and 500 of those fit into the
fast-path array. We're likely to see the 500 fast-path locks already
acquired, and a random fraction of the 500 non-fast-path locks. So in
the end you'll se backends needing 500 fast-path locks and 250 regular
locks. That doesn't seem terrible, but I guess the difference can be
made even larger.

... it doesn't need to perfect data to act, right? We may just need
information that it is happening (well we do). Maybe it's too
pragmatic point of view, but wasting some bits of memory for this, but
still being allowed to control it how much it allocates in the end --
is much better situation than today, without any control where we are
wasting crazy CPU time on all those futex() syscalls and context
switches

Another angle is that if you see the SQL causing it, it is most often
going to be attributed to partitioning and people ending up accessing
way too many partitions (thousands) without proper partition pruning -
sometimes it even triggers re-partitioning of the said tables. So
maybe the realistic "fastpath sizing" should assume something that
supports:
a) usual number of tables in JOINs (just few of them are fine like today) -> ok
b) interval 1 month partitions for let's say 5 years (12*5 = 60),
joined to some other table like that gives like what, max 120? -> so
if you have users doing SELECT * FROM such_table , they will already
have set the max_locks_per_xact probably to something higher.
c) HASH partitioning up to VCPU-that-are-in-the-wild count? (say 64 or
128? so it sounds same as above?)
d) probably we should not care here at all if somebody wants daily
partitioning across years with HASH (sub)partitions without partition
pruning -> it has nothing to do with being "fast" anyway

Judging from the current reports, people have configured
max_locks_per_xact like this: ~75% have it at default (64), 10% has
1024, 5% has 128 and the rest (~10%) is having between 100..thousands,
with extreme one-offs @ 25k (wild misconfiguration judging from the
other specs).

BTW: you probably need to register this $thread into CF for others to
see too (it's not there?)

-J.

#27Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#24)
5 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

I've spent quite a bit of time trying to identify cases where having
more fast-path lock slots could be harmful, without any luck. I started
with the EPYC machine I used for the earlier tests, but found nothing,
except for a couple cases unrelated to this patch, because it affects
even cases without the patch applied at all. More like random noise or
maybe some issue with the VM (or differences to the VM used earlier). I
pushed the results to githus [1]https://github.com/tvondra/pg-lock-scalability-results anyway, if anyone wants to look.

So I switched to my smaller machines, and ran a simple test on master,
with the hard-coded arrays, and with the arrays moves out of PGPROC (and
sized per max_locks_per_transaction).

I was looking for regressions, so I wanted to test a case that can't
benefit from fast-path locking, while paying the costs. So I decided to
do pgbench -S with 4 partitions, because that fits into the 16 slots we
had before, and scale 1 to keep everything in memory. And then did a
couple read-only runs, first with 64 locks/transaction (default), then
with 1024 locks/transaction.

Attached is a shell script I used to collect this - it creates and
removes clusters, so be careful. Should be fairly obvious what it tests
and how.

The results for max_locks_per_transaction=64 look like this (the numbers
are throughput):

machine mode clients master built-in with-guc
---------------------------------------------------------
i5 prepared 1 14970 14991 14981
4 51638 51615 51388
simple 1 14042 14136 14008
4 48705 48572 48457
------------------------------------------------------
xeon prepared 1 13213 13330 13170
4 49280 49191 49263
16 151413 152268 151560
simple 1 12250 12291 12316
4 45910 46148 45843
16 141774 142165 142310

And compared to master

machine mode clients built-in with-guc
-------------------------------------------------
i5 prepared 1 100.14% 100.08%
4 99.95% 99.51%
simple 1 100.67% 99.76%
4 99.73% 99.49%
----------------------------------------------
xeon prepared 1 100.89% 99.68%
4 99.82% 99.97%
16 100.56% 100.10%
simple 1 100.34% 100.54%
4 100.52% 99.85%
16 100.28% 100.38%

So, no difference whatsoever - it's +/- 0.5%, well within random noise.
And with max_locks_per_transaction=1024 the story is exactly the same:

machine mode clients master built-in with-guc
---------------------------------------------------------
i5 prepared 1 15000 14928 14948
4 51498 51351 51504
simple 1 14124 14092 14065
4 48531 48517 48351
xeon prepared 1 13384 13325 13290
4 49257 49309 49345
16 151668 151940 152201
simple 1 12357 12351 12363
4 46039 46126 46201
16 141851 142402 142427

machine mode clients built-in with-guc
-------------------------------------------------
i5 prepared 1 99.52% 99.65%
4 99.71% 100.01%
simple 1 99.77% 99.58%
4 99.97% 99.63%
xeon prepared 1 99.56% 99.30%
4 100.11% 100.18%
16 100.18% 100.35%
simple 1 99.96% 100.05%
4 100.19% 100.35%
16 100.39% 100.41%

with max_locks_per_transaction=1024, it's fair to expect the fast-path
locking to be quite beneficial. Of course, it's possible the GUC is set
this high because of some rare issue (say, to run pg_dump, which needs
to lock everything).

I did look at docs if anything needs updating, but I don't think so. The
SGML docs only talk about fast-path locking at fairly high level, not
about how many we have etc. Same for src/backend/storage/lmgr/README,
which is focusing on the correctness of fast-path locking, and that's
not changed by this patch.

I also cleaned up (removed) some of the Asserts checking that we got a
valid group / slot index. I don't think this really helped in practice,
once I added asserts to the macros.

Anyway, at this point I'm quite happy with this improvement. I didn't
have any clear plan when to commit this, but I'm considering doing so
sometime next week, unless someone objects or asks for some additional
benchmarks etc.

One thing I'm not quite sure about yet is whether to commit this as a
single change, or the way the attached patches do that, with the first
patch keeping the larger array in PGPROC and the second patch making it
separate and sized on max_locks_per_transaction ... Opinions?

regards

[1]: https://github.com/tvondra/pg-lock-scalability-results

--
Tomas Vondra

Attachments:

v20240912-0001-Increase-the-number-of-fast-path-lock-slot.patchtext/x-patch; charset=UTF-8; name=v20240912-0001-Increase-the-number-of-fast-path-lock-slot.patchDownload
From 7ae67a162fdcb80746bed45260fa937fc025b08b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 12 Sep 2024 23:09:41 +0200
Subject: [PATCH v20240912 1/2] Increase the number of fast-path lock slots

The fast-path locking introduced in 9.2 allowed each backend to acquire
up to 16 relation locks cheaply, provided the lock level allows that.
If a backend needs to hold more locks, it has to insert them into the
regular lock table in shared memory. This is considerably more
expensive, and on many-core systems may be subject to contention.

The limit of 16 entries was always rather low, even with simple queries
and schemas with only a few tables. We have to lock all relations - not
just tables, but also indexes, views, etc. Moreover, for planning we
need to lock all relations that might be used in the plan, not just
those that actually get used in the final plan. It only takes a couple
tables with multiple indexes to need more than 16 locks. It was quite
common to fill all fast-path slots.

As partitioning gets used more widely, with more and more partitions,
this limit is trivial to hit, with complex queries easily using hundreds
or even thousands of locks. For workloads doing a lot of I/O this is not
noticeable, but on large machines with enough RAM to keep the data in
memory, the access to the shared lock table may be a serious issue.

This patch improves this by increasing the number of fast-path slots
from 16 to 1024. The slots remain in PGPROC, and are organized as an
array of 16-slot groups (each group being effectively a clone of the
original fast-path approach). Instead of accessing this as a big hash
table with open addressing, we treat this as a 16-way set associative
cache. Each relation (identified by a "relid" OID) is mapped to a
particular 16-slot group by calculating a hash

    h(relid) = ((relid * P) mod N)

where P is a hard-coded prime, and N is the number of groups. This is
not a great hash function, but it works well enough - the main purpose
is to prevent "hot groups" with runs of consecutive OIDs, which might
fill some of the fast-path groups. The multiplication by P ensures that.
If the OIDs are already spread out, the hash should not group them.

The groups are processed by linear search. With only 16 entries this is
cheap, and the groups have very good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table is getting almost full. The
usual solution is to grow the table, but for hash tables in shared
memory that's not trivial. It would also have worse locality, due to
more random access.

Luckily, fast-path locking already has a simple solution to deal with a
full hash table. The lock can be simply inserted into the shared lock
table, just like before. Of course, if this happens too often, that
reduces the benefit of fast-path locking.

This patch hard-codes the number of groups to 64, which means 1024
fast-path locks. As all the information is still stored in PGPROC, this
grows PGPROC by about 4.5kB (from ~840B to ~5kB). This is a trade off
exchanging memory for cheaper locking.

Ultimately, the number of fast-path slots should not be hard coded, but
adjustable based on what the workload does, perhaps using a GUC. That
however means it can't be stored in PGPROC directly.
---
 src/backend/storage/lmgr/lock.c | 118 ++++++++++++++++++++++++++------
 src/include/storage/proc.h      |   8 ++-
 2 files changed, 102 insertions(+), 24 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a98f08..d053ae0c409 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,23 +184,53 @@ static int	FastPathLocalUseCount = 0;
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) \
+	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +956,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +2000,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2094,12 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2667,21 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2654,7 +2697,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2713,21 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2737,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2714,7 +2766,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2739,9 +2792,16 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		/* fast-path group the lock belongs to */
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
+
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2793,13 +2853,21 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2903,6 +2971,10 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2957,7 +3029,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -2979,9 +3051,13 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
+
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..845058da9fa 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,7 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
+														 * each fast-path slot */
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
-- 
2.46.0

v20240912-0002-Set-fast-path-slots-using-max_locks_per_tr.patchtext/x-patch; charset=UTF-8; name=v20240912-0002-Set-fast-path-slots-using-max_locks_per_tr.patchDownload
From 1e3be15e39aadc58db4c9be86cfee64f0395dfd4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 12 Sep 2024 23:09:50 +0200
Subject: [PATCH v20240912 2/2] Set fast-path slots using
 max_locks_per_transaction

Instead of using a hard-coded value of 64 groups (1024 fast-path slots),
determine the value based on max_locks_per_transaction GUC. This size
is calculated at startup, before allocating shared memory.

The default value of max_locks_per_transaction value is 64, which means
4 groups of fast-path locks.

The purpose of the max_locks_per_transaction GUC is to size the shared
lock table, but it's the best information about the expected number of
locks available. It is often set to an average number of locks needed by
a backend, but some backends may need substantially fewer/more locks.

This means fast-path capacity calculated from max_locks_per_transaction
may not be sufficient for some backends, forcing use of the shared lock
table. The assumption is this is not a major issue - there can't be too
many of such backends, otherwise the max_locks_per_transaction would
need to be higher anyway (resolving the fast-path issue too).

If that happens to be a problem, the only solution is to increase the
GUC, even if the shared lock table had sufficient capacity. That is not
free, because each lock in the shared lock table requires about 500B.
With many backends this may be a substantial amount of memory, but then
again - that should only happen on machines with plenty of memory.

In the future we can consider a separate GUC for the number of fast-path
slots, but let's try without one first.

An alternative solution might be to size the fast-path arrays for a
multiple of max_locks_per_transaction. The cost of adding a fast-path
slot is much lower (only ~5B compared to ~500B per entry), so this would
be cheaper than increasing max_locks_per_transaction. But it's not clear
what multiple of max_locks_per_transaction to use.
---
 src/backend/bootstrap/bootstrap.c   |  2 ++
 src/backend/postmaster/postmaster.c |  5 +++
 src/backend/storage/lmgr/lock.c     | 28 +++++++++++++----
 src/backend/storage/lmgr/proc.c     | 47 +++++++++++++++++++++++++++++
 src/backend/tcop/postgres.c         |  3 ++
 src/backend/utils/init/postinit.c   | 34 +++++++++++++++++++++
 src/include/miscadmin.h             |  1 +
 src/include/storage/proc.h          | 11 ++++---
 8 files changed, 120 insertions(+), 11 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7637581a184..ed59dfce893 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
+	InitializeFastPathLocks();
+
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 96bc1d1cfed..f4a16595d7f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Also calculate the size of the fast-path lock arrays in PGPROC.
+	 */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index d053ae0c409..505aa52668e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory,
+ * and remains constant after that.
+ *
+ * We set the limit based on max_locks_per_transaction GUC, because that's
+ * the best information about expected number of locks per backend we have.
+ * See InitializeFastPathLocks for details.
+ */
+int			FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -195,11 +211,11 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * small enough to not cause overflows (in 64-bit).
  */
 #define FAST_PATH_LOCK_REL_GROUP(rel) \
-	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2973,9 +2989,6 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	int			fast_count = 0;
 	uint32		group;
 
-	/* fast-path group the lock belongs to */
-	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
-
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
@@ -3005,6 +3018,9 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	partitionLock = LockHashPartitionLock(hashcode);
 	conflictMask = lockMethodTable->conflictTab[lockmode];
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/*
 	 * Fast path locks might not have been entered in the primary lock table.
 	 * If the lock we're dealing with could conflict with such a lock, we must
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..a91b6f8a6c0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
 	Size		size = 0;
 	Size		TotalProcs =
 		add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* ProcGlobal */
 	size = add_size(size, sizeof(PROC_HDR));
@@ -113,6 +115,18 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * fast-path lock arrays
+	 *
+	 * XXX The explicit alignment may not be strictly necessary, as both
+	 * values are already multiples of 8 bytes, which is what MAXALIGN does.
+	 * But better to make that obvious.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
+
 	return size;
 }
 
@@ -162,6 +176,10 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *fpPtr,
+			   *fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +229,38 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, so
+	 * can't be included in PGPROC. We allocate a separate piece of shared
+	 * memory and then divide that between backends.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
+	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
+	/* For asserts checking we did not overflow. */
+	fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays, and move the pointer. We interleave
+		 * the two arrays, to keep at least some locality.
+		 */
+		proc->fpLockBits = (uint64 *) fpPtr;
+		fpPtr += fpLockBitsSize;
+
+		proc->fpRelId = (Oid *) fpPtr;
+		fpPtr += fpRelIdSize;
+
+		Assert(fpPtr <= fpEndPtr);
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +322,9 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	/* We expect to consume exactly the expected amount of data. */
+	Assert(fpPtr = fpEndPtr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea1135..f54ae00abca 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,6 +4166,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
+	/* Initialize size of fast-path lock cache. */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3b50ce19a2c..1faf756c8d8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,6 +557,40 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
+/*
+ * Initialize the number of fast-path lock slots in PGPROC.
+ *
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
+ *
+ * The default max_locks_per_xact=64 means 4 groups by default.
+ *
+ * We allow anything between 1 and 1024 groups, with the usual power-of-2
+ * logic. The 1 is the "old" value before allowing multiple groups, 1024
+ * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
+ * 1024 are unlikely to be beneficial - we're likely to hit other
+ * bottlenecks long before that.
+ */
+void
+InitializeFastPathLocks(void)
+{
+	Assert(FastPathLockGroupsPerBackend == 0);
+
+	/* we need at least one group */
+	FastPathLockGroupsPerBackend = 1;
+
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we exceed max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb9..e26d108a470 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
+extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 845058da9fa..0e55c166529 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,9 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
-														 * each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

results-1024.csvtext/csv; charset=UTF-8; name=results-1024.csvDownload
run-lock-test.shapplication/x-shellscript; name=run-lock-test.shDownload
results-64.csvtext/csv; charset=UTF-8; name=results-64.csvDownload
#28Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#27)
2 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

Turns out there was a bug in EXEC_BACKEND mode, causing failures on the
Windows machine in CI. AFAIK the reason is pretty simple - the backends
don't see the number of fast-path groups postmaster calculated from
max_locks_per_transaction.

Fixed that by calculating it again in AttachSharedMemoryStructs, which
seems to have done the trick. With this the CI builds pass just fine,
but I'm not sure if EXEC_BACKENDS may have some other issues with the
PGPROC changes. Could it happen that the shared memory gets mapped
differently, in which case the pointers might need to be adjusted?

regards

--
Tomas Vondra

Attachments:

v20240913-0002-Set-fast-path-slots-using-max_locks_per_tr.patchtext/x-patch; charset=UTF-8; name=v20240913-0002-Set-fast-path-slots-using-max_locks_per_tr.patchDownload
From 46c2ec00017821a32b0f9fba8e56b2ad46d9d239 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 12 Sep 2024 23:09:50 +0200
Subject: [PATCH v20240913 2/2] Set fast-path slots using
 max_locks_per_transaction

Instead of using a hard-coded value of 64 groups (1024 fast-path slots),
determine the value based on max_locks_per_transaction GUC. This size
is calculated at startup, before allocating shared memory.

The default value of max_locks_per_transaction value is 64, which means
4 groups of fast-path locks.

The purpose of the max_locks_per_transaction GUC is to size the shared
lock table, but it's the best information about the expected number of
locks available. It is often set to an average number of locks needed by
a backend, but some backends may need substantially fewer/more locks.

This means fast-path capacity calculated from max_locks_per_transaction
may not be sufficient for some backends, forcing use of the shared lock
table. The assumption is this is not a major issue - there can't be too
many of such backends, otherwise the max_locks_per_transaction would
need to be higher anyway (resolving the fast-path issue too).

If that happens to be a problem, the only solution is to increase the
GUC, even if the shared lock table had sufficient capacity. That is not
free, because each lock in the shared lock table requires about 500B.
With many backends this may be a substantial amount of memory, but then
again - that should only happen on machines with plenty of memory.

In the future we can consider a separate GUC for the number of fast-path
slots, but let's try without one first.

An alternative solution might be to size the fast-path arrays for a
multiple of max_locks_per_transaction. The cost of adding a fast-path
slot is much lower (only ~5B compared to ~500B per entry), so this would
be cheaper than increasing max_locks_per_transaction. But it's not clear
what multiple of max_locks_per_transaction to use.
---
 src/backend/bootstrap/bootstrap.c   |  2 ++
 src/backend/postmaster/postmaster.c |  5 +++
 src/backend/storage/ipc/ipci.c      |  6 ++++
 src/backend/storage/lmgr/lock.c     | 28 +++++++++++++----
 src/backend/storage/lmgr/proc.c     | 47 +++++++++++++++++++++++++++++
 src/backend/tcop/postgres.c         |  3 ++
 src/backend/utils/init/postinit.c   | 34 +++++++++++++++++++++
 src/include/miscadmin.h             |  1 +
 src/include/storage/proc.h          | 11 ++++---
 9 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7637581a184..ed59dfce893 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
+	InitializeFastPathLocks();
+
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 96bc1d1cfed..f4a16595d7f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Also calculate the size of the fast-path lock arrays in PGPROC.
+	 */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 6caeca3a8e6..10fc18f2529 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -178,6 +178,12 @@ AttachSharedMemoryStructs(void)
 	Assert(MyProc != NULL);
 	Assert(IsUnderPostmaster);
 
+	/*
+	 * In EXEC_BACKEND mode, backends don't inherit the number of fast-path
+	 * groups we calculated before setting the shmem up, so recalculate it.
+	 */
+	InitializeFastPathLocks();
+
 	CreateOrAttachShmemStructs();
 
 	/*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index d053ae0c409..505aa52668e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory,
+ * and remains constant after that.
+ *
+ * We set the limit based on max_locks_per_transaction GUC, because that's
+ * the best information about expected number of locks per backend we have.
+ * See InitializeFastPathLocks for details.
+ */
+int			FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -195,11 +211,11 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * small enough to not cause overflows (in 64-bit).
  */
 #define FAST_PATH_LOCK_REL_GROUP(rel) \
-	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2973,9 +2989,6 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	int			fast_count = 0;
 	uint32		group;
 
-	/* fast-path group the lock belongs to */
-	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
-
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
@@ -3005,6 +3018,9 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	partitionLock = LockHashPartitionLock(hashcode);
 	conflictMask = lockMethodTable->conflictTab[lockmode];
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/*
 	 * Fast path locks might not have been entered in the primary lock table.
 	 * If the lock we're dealing with could conflict with such a lock, we must
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..a91b6f8a6c0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
 	Size		size = 0;
 	Size		TotalProcs =
 		add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* ProcGlobal */
 	size = add_size(size, sizeof(PROC_HDR));
@@ -113,6 +115,18 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * fast-path lock arrays
+	 *
+	 * XXX The explicit alignment may not be strictly necessary, as both
+	 * values are already multiples of 8 bytes, which is what MAXALIGN does.
+	 * But better to make that obvious.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
+
 	return size;
 }
 
@@ -162,6 +176,10 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *fpPtr,
+			   *fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +229,38 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, so
+	 * can't be included in PGPROC. We allocate a separate piece of shared
+	 * memory and then divide that between backends.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
+	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
+	/* For asserts checking we did not overflow. */
+	fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays, and move the pointer. We interleave
+		 * the two arrays, to keep at least some locality.
+		 */
+		proc->fpLockBits = (uint64 *) fpPtr;
+		fpPtr += fpLockBitsSize;
+
+		proc->fpRelId = (Oid *) fpPtr;
+		fpPtr += fpRelIdSize;
+
+		Assert(fpPtr <= fpEndPtr);
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +322,9 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	/* We expect to consume exactly the expected amount of data. */
+	Assert(fpPtr = fpEndPtr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea1135..f54ae00abca 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,6 +4166,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
+	/* Initialize size of fast-path lock cache. */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3b50ce19a2c..1faf756c8d8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,6 +557,40 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
+/*
+ * Initialize the number of fast-path lock slots in PGPROC.
+ *
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
+ *
+ * The default max_locks_per_xact=64 means 4 groups by default.
+ *
+ * We allow anything between 1 and 1024 groups, with the usual power-of-2
+ * logic. The 1 is the "old" value before allowing multiple groups, 1024
+ * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
+ * 1024 are unlikely to be beneficial - we're likely to hit other
+ * bottlenecks long before that.
+ */
+void
+InitializeFastPathLocks(void)
+{
+	Assert(FastPathLockGroupsPerBackend == 0);
+
+	/* we need at least one group */
+	FastPathLockGroupsPerBackend = 1;
+
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we exceed max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb9..e26d108a470 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
+extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 845058da9fa..0e55c166529 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,9 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
-														 * each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

v20240913-0001-Increase-the-number-of-fast-path-lock-slot.patchtext/x-patch; charset=UTF-8; name=v20240913-0001-Increase-the-number-of-fast-path-lock-slot.patchDownload
From 45a2111c51016386a31d766700b23d9d88ff6c0b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Thu, 12 Sep 2024 23:09:41 +0200
Subject: [PATCH v20240913 1/2] Increase the number of fast-path lock slots

The fast-path locking introduced in 9.2 allowed each backend to acquire
up to 16 relation locks cheaply, provided the lock level allows that.
If a backend needs to hold more locks, it has to insert them into the
regular lock table in shared memory. This is considerably more
expensive, and on many-core systems may be subject to contention.

The limit of 16 entries was always rather low, even with simple queries
and schemas with only a few tables. We have to lock all relations - not
just tables, but also indexes, views, etc. Moreover, for planning we
need to lock all relations that might be used in the plan, not just
those that actually get used in the final plan. It only takes a couple
tables with multiple indexes to need more than 16 locks. It was quite
common to fill all fast-path slots.

As partitioning gets used more widely, with more and more partitions,
this limit is trivial to hit, with complex queries easily using hundreds
or even thousands of locks. For workloads doing a lot of I/O this is not
noticeable, but on large machines with enough RAM to keep the data in
memory, the access to the shared lock table may be a serious issue.

This patch improves this by increasing the number of fast-path slots
from 16 to 1024. The slots remain in PGPROC, and are organized as an
array of 16-slot groups (each group being effectively a clone of the
original fast-path approach). Instead of accessing this as a big hash
table with open addressing, we treat this as a 16-way set associative
cache. Each relation (identified by a "relid" OID) is mapped to a
particular 16-slot group by calculating a hash

    h(relid) = ((relid * P) mod N)

where P is a hard-coded prime, and N is the number of groups. This is
not a great hash function, but it works well enough - the main purpose
is to prevent "hot groups" with runs of consecutive OIDs, which might
fill some of the fast-path groups. The multiplication by P ensures that.
If the OIDs are already spread out, the hash should not group them.

The groups are processed by linear search. With only 16 entries this is
cheap, and the groups have very good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table is getting almost full. The
usual solution is to grow the table, but for hash tables in shared
memory that's not trivial. It would also have worse locality, due to
more random access.

Luckily, fast-path locking already has a simple solution to deal with a
full hash table. The lock can be simply inserted into the shared lock
table, just like before. Of course, if this happens too often, that
reduces the benefit of fast-path locking.

This patch hard-codes the number of groups to 64, which means 1024
fast-path locks. As all the information is still stored in PGPROC, this
grows PGPROC by about 4.5kB (from ~840B to ~5kB). This is a trade off
exchanging memory for cheaper locking.

Ultimately, the number of fast-path slots should not be hard coded, but
adjustable based on what the workload does, perhaps using a GUC. That
however means it can't be stored in PGPROC directly.
---
 src/backend/storage/lmgr/lock.c | 118 ++++++++++++++++++++++++++------
 src/include/storage/proc.h      |   8 ++-
 2 files changed, 102 insertions(+), 24 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a98f08..d053ae0c409 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,23 +184,53 @@ static int	FastPathLocalUseCount = 0;
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) \
+	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +956,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +2000,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2094,12 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2667,21 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2654,7 +2697,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2713,21 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2737,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2714,7 +2766,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2739,9 +2792,16 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		/* fast-path group the lock belongs to */
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
+
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2793,13 +2853,21 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2903,6 +2971,10 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2957,7 +3029,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -2979,9 +3051,13 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
+
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..845058da9fa 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,7 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
+														 * each fast-path slot */
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
-- 
2.46.0

#29Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#28)
Re: scalability bottlenecks with (many) partitions (and more)

On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas@vondra.me> wrote:

[..]

Anyway, at this point I'm quite happy with this improvement. I didn't
have any clear plan when to commit this, but I'm considering doing so
sometime next week, unless someone objects or asks for some additional
benchmarks etc.

Thank you very much for working on this :)

The only fact that comes to my mind is that we could blow up L2
caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
be like one or two 2MB huge pages more @ common max_connections=1000
x86_64 (830kB -> ~5.1MB), and indeed:

# without patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
177

# with patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
178

So playing Devil's advocate , the worst situation that could possibly
hurt (?) could be:
* memory size of PGPROC working set >> L2_cache (thus very high
max_connections),
* insane number of working sessions on CPU (sessions >> VCPU) - sadly
happens to some,
* those sessions wouldn't have to be competing for the same Oids -
just fetching this new big fpLockBits[] structure - so probing a lot
for lots of Oids, but *NOT* having to use futex() syscall [so not that
syscall price]
* no huge pages (to cause dTLB misses)

then maybe(?) one could observe further degradation of dTLB misses in
the perf-stat counter under some microbenchmark, but measuring that
requires isolated and physical hardware. Maybe that would be actually
noise due to overhead of context-switches itself. Just trying to think
out loud, what big PGPROC could cause here. But this is already an
unhealthy and non-steady state of the system, so IMHO we are good,
unless someone comes up with a better (more evil) idea.

I did look at docs if anything needs updating, but I don't think so. The

SGML docs only talk about fast-path locking at fairly high level, not
about how many we have etc.

Well the only thing I could think of was to add to the
doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
is also used as advisory for the number of groups used in
lockmanager's fast-path implementation" (that is, without going into
further discussion, as even pg_locks discussion
doc/src/sgml/system-views.sgml simply uses that term).

-J.

#30Tomas Vondra
tomas@vondra.me
In reply to: Jakub Wartak (#29)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/16/24 15:11, Jakub Wartak wrote:

On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas@vondra.me> wrote:

[..]

Anyway, at this point I'm quite happy with this improvement. I didn't
have any clear plan when to commit this, but I'm considering doing so
sometime next week, unless someone objects or asks for some additional
benchmarks etc.

Thank you very much for working on this :)

The only fact that comes to my mind is that we could blow up L2
caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
be like one or two 2MB huge pages more @ common max_connections=1000
x86_64 (830kB -> ~5.1MB), and indeed:

# without patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
177

# with patch:
postgres@hive:~$ /usr/pgsql18/bin/postgres -D /tmp/pg18 -C
shared_memory_size_in_huge_pages
178

So playing Devil's advocate , the worst situation that could possibly
hurt (?) could be:
* memory size of PGPROC working set >> L2_cache (thus very high
max_connections),
* insane number of working sessions on CPU (sessions >> VCPU) - sadly
happens to some,
* those sessions wouldn't have to be competing for the same Oids -
just fetching this new big fpLockBits[] structure - so probing a lot
for lots of Oids, but *NOT* having to use futex() syscall [so not that
syscall price]
* no huge pages (to cause dTLB misses)

then maybe(?) one could observe further degradation of dTLB misses in
the perf-stat counter under some microbenchmark, but measuring that
requires isolated and physical hardware. Maybe that would be actually
noise due to overhead of context-switches itself. Just trying to think
out loud, what big PGPROC could cause here. But this is already an
unhealthy and non-steady state of the system, so IMHO we are good,
unless someone comes up with a better (more evil) idea.

I've been thinking about such cases too, but I don't think it can really
happen in practice, because:

- How likely is it that the sessions will need a lot of OIDs, but not
the same ones? Also, why would it matter that the OIDs are not the same,
I don't think it matters unless one of the sessions needs an exclusive
lock, at which point the optimization doesn't really matter.

- If having more fast-path slots means it doesn't fit into L2 cache,
would we fit into L2 without it? I don't think so - if there really are
that many locks, we'd have to add those into the shared lock table, and
there's a lot of extra stuff to keep in memory (relcaches, ...).

This is pretty much one of the cases I focused on in my benchmarking,
and I'm yet to see any regression.

I did look at docs if anything needs updating, but I don't think so. The

SGML docs only talk about fast-path locking at fairly high level, not
about how many we have etc.

Well the only thing I could think of was to add to the
doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
is also used as advisory for the number of groups used in
lockmanager's fast-path implementation" (that is, without going into
further discussion, as even pg_locks discussion
doc/src/sgml/system-views.sgml simply uses that term).

Thanks, I'll consider mentioning this in max_locks_per_transaction.
Also, I think there's a place calculating the amount of per-connection
memory, so maybe that needs to be updated too.

regards

--
Tomas Vondra

#31Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#30)
3 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

I've spent the last couple days doing all kinds of experiments trying to
find regressions caused by the patch, but no success. Which is good.

Attached is a script that just does a simple pgbench on a tiny table,
with no or very few partitions. The idea is that this will will fit into
shared buffers (thus no I/O), and will fit into the 16 fast-path slots
we have now. It can't benefit from the patch - it can only get worse, if
having more fast-path slots hurts.

I ran this on my two machines, and in both cases the results are +/- 1%
from the master for all combinations of parameters (clients, mode,
number of partitions, ..). In most cases it's actually much closer,
particularly with the default max_locks_per_transaction value.

For higher values of the GUC, I think it's fine too - the differences
are perhaps a bit larger (~1.5%), but it's clearly hardware specific (i5
gets a bit faster, xeon a bit slower). And I'm pretty sure people who
increased that GUC value likely did that because of locking many rels,
and so will actually benefit from the increased fast-path capacity.

At this point I'm pretty happy and confident the patch is fine. Unless
someone objects, I'll get it committed after going over over it one more
time. I decided to commit that as as a single change - it would be weird
to have an intermediate state with larger arrays in PGPROC, when that's
not something we actually want.

I still haven't found any places in the docs that should mention this,
except for the bit about max_locks_per_transaction GUC. There's nothing
in SGML mentioning details of fast-path locking. I thought we have some
formula to calculate per-connection memory, but I think I confused that
with the shmmem formulas we had in "Managing Kernel Resources". But even
that no longer mentions max_connections in master.

regards

--
Tomas Vondra

Attachments:

lock-test.odsapplication/vnd.oasis.opendocument.spreadsheet; name=lock-test.odsDownload
PK��1Y�l9�..mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK��1YConfigurations2/menubar/PK��1YConfigurations2/progressbar/PK��1YConfigurations2/popupmenu/PK��1YConfigurations2/accelerator/PK��1YConfigurations2/floater/PK��1YConfigurations2/statusbar/PK��1YConfigurations2/toolbar/PK��1YConfigurations2/toolpanel/PK��1YConfigurations2/images/Bitmaps/PK��1Y
styles.xml�[Y���~���h��Z<�����4���"�M�vAK��E
$�L~}A��.Y����I�s�������1#�q�]���wD#c���_?�������$�Z�,�e�J �A�9f��U��@TB���q�bP`��0Cb%�������J�-�#!�r�n��W�w8f��������S4 �Qz�;�qL��C��{�C	���3+q���W"X.���Z��eS�1#���r���\Nu���;�0�M�V�u����O�-�[���7����Tm&�5}��R�(8lt���s��TMGA@�@��J�!m��Q��%�e�	����V�2�F]
|O��,�F-� ���+�|��f�#�^0�rVr�h����zK����Z)���9�X� �b������:�.s]ZV���-�,�^b�`�:����[z������/�i���������.�s9b���Q��,��O'�b���h���h�FFU�&��tgH�����.K��m��6C}J��6�O���S�	F�f��_�%��LL%�,_�����dm3��wv�L� �1���{^�Q;�g�v����uD��
����f�<��?�������A�iL��A�(�8Z����hp�XF����������I��9/��/���A���S�{eOc�+�����	V�Of�7�w3^4P��%pGL[eg6��%"D�k�s�a�a�9g9�#a�$�!Yb,$�j��g��V�S(���vT_�
�����
�\�1i��)Z��������k��n{Z���~*F��2j
��di���bTZ��w]�
C	:GK}��k@I���	%��t�dvJ�u^qT��|ndl�����^�C_���W7���u���J0!%�6\&Q��s�0,/�t���lD���X�$�QM,L{�uza�j�0MA�b�vs����X��b�;�DU�����|����Fr�kR[�vV�.��)������H�N�W���Q���i��.�N<Z��R��H*sd��*�}>����@���
�Vn(�Yg�R/U�_���-q�L4�2HG����#� )��S����9f��B���v�65+9���^y3�8��#�1��������#�9���*�t��v�B�>����(V� �
6w��a�wL�!1�\Y���7���i�\��d� �G�{��8g�-���-�zx���������m	�
����D�M�
�N�$�|CQ�R�`�p��v�����%a����
�,���W�6(&�z��K$(�$^�Q�j����c�u�
�%=R������G�|������������C�^?�7O��wV�����<�0�`Wqr4H��D��br��S}L1N�k7�
z�|4H�����;�j]�OG�J����Ec��cz$	����n
���O{�W��(��GV���j�`�p�H<X��Lu�v�P��5z1�e�������K���%���)��6������	|
'8�l!}�b�����:(�����[�B��2�AC������'��#:��7��i�����l�N�4|����G���.�J��B6H����C�lJ�`8�:�c�j����\:��t�-|���X=�?�C��1I/���I6\��KH�sGw3����pL��������A���2��qGc���4g�^s�Qg�����!� ������X|����s�*�
��\�����#�$�5U_)�/���qa�~�JP�d�H�&��?P?D�h�G��9�_��'��!�?�s'Nd_v��)RO����`0|�<V#g��������9�Z{�p�R=#P'K���_�Zg��j>���T���4��#1�����Mm�v.��Wo?�<����SO�d�y����g��?�|���d�"�_�����N���Zy~���$;t�����z��W����vy�^B�R�&d��y�5_[J����v\�sH��N�Q��*b*����	�M��U@�������x�g���=j�K�@������3So�	}g����}��O@����YYbnxlQ���b������������\1������j@�a1�`�#W��c��3�u�N��z���,�����Z:�Y�	��� ����Wm���S4^���2j!x���	n��x�D�Q���(�s\_p��K�[�����^���C�gP�X��������E8��FH�����)�>�=���|`;�0�m�nS�4�t"A0�Y����C����YxS��FwP_��R�h�c*�����4��RD� �p��&��uF8��
� ��6c	c�*U�#Ek�?��s�Py�������9����!�����
[e����.m3�&y�j�PE��*��rXI���wD�3��������J�<"��o��`�:�����(��4f����wC��(�(���A5Sc5�n���C���=�����b�P�}�����^{���� ����l�;-�
��k���_�������u�kZm�c�����	��qk�NvN%�����H��N��	!U�����Ma1���������Y�������1��h��J���Rg���.t��	��	u��~x�%n�;����7V(��?;��������:\�+9���7���R�X�?��}������x�\�.�/	�-�u�b�������o0��PKN�7o
=PK��1Ymanifest.rdf���n�@��y
�9�ziP �"�U�t�&�������W!iE���I9��|3���6�+xv��ZE�

�������j�/6����(��m�S_�LD�T�a��."����z��D'I�k��R�!�R�� �=
`�]'�08��3��)�������g�L�*L�7�Z���*vR8�#k����(�-H�������t
(=c�+��������P�c��<r�]��e�TW�.����
���|'�[������d�#bA�o�S��kW���
PK��h��PK��1Ycontent.xml���r�Hr��~
'��u��Gg=1�{��n;���H�B4H0�r��QH���$[,*�,���?Y�D�c������Uyq��MQ��&���E��W�b�����o���_�����������j~����t^��|�^<��us���&_�Y�}�]�����h.��*o.��e���[��?�\v�����F�W����\�f�A��^�~���o����l�^��E���S*f��"k��}�?�ek�x��^�`��Y��N��|4���/?�:��W��X7m���{���V;�MU��W�|���/������*��}�nw�f^�z8zux ��Pk�
mWY{�J��������������^�����WU�n8o�+�������7�������j:�V��-����>�9�C]�y�;�e���������:{x�+1:���������r����tsW��h1��e��x3c��N/o�?n'Q�����[/�s��_������������O���?J�����u���]�����
|�]2�wcf�e��s�NA�X^[�����Z,��b����
?���7U��g��0�N��1�*��jq3-�m^W�WN��~	>���WN��mV��r'~2E������������^����8[���~��nu�������N��"/�c\MW������,�tAL�os�����^�����&���E>/���z�v/_�{?�&?�y�����l�L.���V�*�W��6U���t����'���e���b~5i��y�����jr��Ew���v�U[u�]���y��O���\q�G����x���>[g��.��G�r��q^���j���|�}�nBt�>�N�����N���*�V���2|q���M^�E�\�T��u��>��o�:������8����Q(��������}�_����9VW�W�[����X��~pfl�����]�O�M[��rZ��]���^�{��*k���n�e>�-���dwe��K_�G��h6e�c�g�4����tU-��IYO��?���m ���.��M��.��������ys5���;�b=}�]O�e^O��h��	�{�������{@.��{���������Ok��������_��}�,����gyYn��d�����f��U�ON���W���������+�w��)������y=����_���u>	^/����qn�^4UY,.�B�����?3���fx|��������0�#X|��zy&=��`����#N���~��$��>��>��X��	ec;�����S������z��{������8�����q�����~���f��c�,[�H��I��g��
�h��1)R���=�T����{r��./�
���������S�X���r�H�:V:{�_6Pp>�+���:|&�6��i7���Ci@�`�?�}oW�_�;�F�X��Q���b��U�G�{��u49t2�c�����H(�$����%��
������;��<����O���7�eXfr��+>�zE��Xy�A?���;���0�z��W�-S���;��hi��~����-��W���;ZF~�_)P>�����d�%"��c��1/k�y�|x��9S>�-��h�������.*}4���*�<"_~UV����SWe?>\�=(\�*����9������������*\��P�pU��p��������
��>���_�U�*{��s�|�UY����We����l��)�����N����;7=�f
{)�l���h:)�1��#�����u�>
�����l�`>Y�x�K}��s���j�9��z(l��<"����S��5c��a|;�
�O��
�'�����#<�F�s��y���;a�j�����j�c�'}S������������!	we��|������������b����u^6W���l����
��|yWf�4�V���_R=�b���f�=���w���W� ����w�~b7t�~A�5�>����K�i�o���W�U,�h������[���!}�g��e�k]��������&M[���b�������{����*�����l��������[�t��:[�oO:dY�oN:�&����h���A?�����Os9����a���9������|y�����7e���'����/:�kw�������}}�����3<�NO��Xm���������
�?�	�B�����Qi��z�3@D!�Q��v�`k�9%��kc9B� \;��	��N��B@!�"�P�3��}"~ yQVp���K/�����0�,�&R��
A��A��s�T��������F95h�9"%���F�0g�zD"
Qg��c\0�p��X�0k��H���
A5�:�e��:�EV/���c�hC�fS�b
1uJL�*����Y�{�i	����x�"������BD%Z�SLZ(�m,G�3+,�
Q�(DT�>)���U�A�+-��P�xG�?{����*�
�Np0�:m,G�D��k��B
!��J�����/�zm,G���*��34@H!�R	���ZK����\��������*D�)Q%`sT||%w4�Sg����E_�����8c�F�@��BD!���	���|�6�#Th"�0�O���B
!��J��G��6����"�L�/�BD�
Q�r�OQ��m���RDi����R)�T��>i4����kc9b�&Bs�`�\h��BL!���qc$��oGs�IG���Ak�������Q�������V��5zWB�����0����r_`��BL!���ia���������S�b
1��J��g�2�����x�pA�7�
U�*DU�5?a�s�����r�rF�U����

T*U�u?&�S�;��h�(k�R�@gah��BX!���	)���^�
���U���'�*D��Q5�M��V��2���^���5�r`� �S���+J0mpw]�����=����	
S�)�T��?�9�����8�+~�������+��*��S�2`V5h�9b��J`���A��BP%]�S��?}��r�C�����R)�T�5?���
z�� ����h)�Z40@X!�N
�qnPN�����~�6�#�X�(t�_`��BP!����1g-��A�N�@��K`��BP!����	����N��8��4��a��BX%_�^������2e40 ��
Q��J���8�������q�(���� �U���P
����FsC2G�T�@���a��`e�������t�FqnP�i�9�4�3
�?�7@L!�S	W��PF�0hc9���S%��Bh��BL!�����8��z/���TD0*�����
Q��J��'���h�A�����I�M
��
A��J�����M3�9� �����*�����B��
a�x�OH����o��\��8�$���q��:5���Y9�������m<GQ�	�A�7@L!�S	W����>h�9�a�sK�B�b
1�v��W�9pM}+������`�+��*��S�2����6�#�?�T�� �T���JY.��T�6�#FQ���-���
A��J������CW�q4W�qDhg)tb`��B\�W����p�O8
�R�k�9b���c���{�
A�x���ah��a��7�)`��1��BL�_��L����#���T[pC���
q��J�(����X������@��*D�*��P�����r�+C�6���Q��BT}�*���A/q4W�������k���d��6��|��;O�����q�r�@c��b���s�#%����(d2*����N[���A�%$��pk� ��Q����*�K��8�+Na����no��BR}yR%\��*|������������{�H)�R*�r�Pm�0���B� �	M2C$�
I�r�OH&�=�V��$�R
L�Bd�������t�}�K
�E3h�9��0JS��id2
�j�Oq���#��v�K��?:�#��PH��k}R+��K�[q4W�5DP�4�,�TH�/O��k}LiM��fm������	�RH)�T��>�%���.�6�#��,.���RH)�T�u>����S�8�+JR����P�*d�IY5�=�)W���>����s���d��7H)�R*�Z�}���^��$��0,�H)�R*�z��>B�K�����O,5��Kh��BN!����	!��S�6�#�c-3�X{�r
9�v�O	|�������w�������9��BN�^�S�B<�GsEIC��+aP�H��F+F9�R��/�9���q��Pu�H�XJ��	U`��BN!�R��i����W�r�HN$g��N����BF%^�3�1���q4WcD[��A	-�VH+�U��?a��>k�FsDJ��3���I��BR�]�c\�M�[q4W���F��}>�@Z!��V�W��F@���Gs�*M�����X ��W'���(�[�B�:�kc:�4�8�r
9�rPq��mm,G��DJ�`�]�GJ!��R���vv��Kc�a��?3�A{�H*$�*��S�@���X�p����*�7{$�
I�v��w���Ge
�X�0���J�V�B$�
I�z��)��M��D��r�Pa��
�?��
yur^�s�r��?&�=?;m4G�"BR	��*0@R!��T����e�k�^�cQN)}���I��BR}��`�pGsESI���vKV�������,���H(���Y��7����Tw�i����hh�c�# �0�MU�2��y9�+���:�	�n���v_��rHM<����"�����@+�����DC�����Z
dYEV}B.n��}Q�5q��]$��k���q	?gr��.�\��o�� b�TD�/����"������k�&zO�������9�/����"�H��[��n
o}�wE���R���[�iEZ��Z�M�����b-�wU���"�H��@��~�zH7��>����B7�.���"�N���T�hX��%�%z6�x�� �����;N/������6�v_���qN�e ��)r���/��4����n���I�
f1�S�9ux��S���o��(v�-Uxo�7iEZ�Vg7�2
=����R���!�CON�S����_\���?o�� 2Z�L��,���"�N��$����/��(��f���C�� �����{N,��*�Tu2m�li	��-R��"���$:������4����VIER�T������x_���:3������"����;@����[�/Ho��z\�e ��*����#:�>����sf`V���"����h�':^s��E�T�����9>�X�/���������������&�g>7�����s+�������v_�l^6���2QDuh���������j��Z�����@DQD���_UW���t_�.M��_=W!EH}:�n�����~kwfm���^�V!EHR��|�U=�{in��(Yy-����VAEPTw|�����������V�p/������"�~U�8�|n��)�^���n�gKK�������"���C;��C��������vt���@DQD�������o��M�/�_A���:���Ht����v���������)B��:���������B����R�Q��@@P����D���Q�[�/����3������"�~U�9p|p�g��M��v���h���kVE@P�{6����v[��3
|�{��("��:���kk� �o�����^^��8�*���:���1�����������\�1EL����w>��B�9[y���|���"����[���k�-�%�7���|9���tO�v����s{������n�� �Bj���W1ELS�})��������V3�����@LS����_�uC'��o���-������aEX��Y�
��������8{�*���:���������bf���������"����?3�T���/��(3�P����� ������;%����|����D�5�����z�)b��:���(���������8{��)b��:��s�����[�-J�lSt��(���"����?	��0��;g�<��Y�ct1TAut�Q�n����������X�j �*����O��~�z��EQm�q�)�����"�~W�9�|p�����g�MA���+sN�Gp1TAuv���8�|kwI�m�wp�f5TAu~�1��
�x_h�������A\W��� |�g��=��K�>�O��~8{�)b��:��1���[�)����=k�/����"����mT��?��mQ��V��;�	���p��%��Q�s��2G7�o��i���Y%�b ��("���/�;�����
��A-K�R�EOHR����_e�=�y��E�k��0����o���"��n�,|��T�vW��j�;����)b��:����������3�M�����aEX�����D�n��(�{F����wuW��O����������g�n�� ��H��,b��"���B�'x;����z��9�Q��@LS������d5�m�����d�E�S����"��n�$�u����n�G+�9���/AEPTG����[�m�p�����;?_���"���$\M�=z�x[��\M��rW��O��=�n�$��OrW�gc\G�c���!EHR�w]|�������4�9|�[AEP���0��}>�����&S��_���"����4������U����@ON�DQET��	����B����e�e1S�1���E���}1fo}�@�iVa�����o���~m\�����;�(�[��5!�$~j��V���r5�Qduj��&`�wk7�i�jN3��r5�Qdup�W^��b�[�-�����~��� �H��'��]����[�o�� 6�F�`]�j �H)R���O�L=�p��E����!����AR�T$��M�������o�b�m�db���9�*��GY��S���}.��Ih���e9�4_2��"�Nm�BE�+�/��AC�=|��F_��|"�n�<�G#0�-���71�������xR�������v[��m�)�/a5�R�)un����{�o�� V}6�km(�f1�R�)ur�'���^�p��E��-��]��8�*��GY������|6�����v[�k?��8���@J�R���M_/A)ukw���[M���@J�R���}��:�w��E�1��Tx��/IER�Tg�}e�%e��o/��#��,_2��"��n�����}�� &�$�k��4/9EN�S��}�
����}Q�����T�7i�i���~ ��a�����S��6~U���	O�� )��W��^�9EN����)�Hukw��m�u��EW9EN�S��~U�@���)�e��6�F�78����"��n�,�g��pk7��j�j���KON�S�����hf*�������F��>�9H+���:�4s+C�n��("M�-�<��y���z��s;@G�o�� :���^���@N�S���`�0�����$��V���.r��"��=K�!�K�-���4=���AR�T$�������G��mA,[ZYGe�$IER��F���v�rS����v��]�)EJ���������?o��(y�� �N�-����q^�������f�7�=���D���>���2�T$Iux�g}z��S�+�F�)>|]$IER���]�u�c�S�-J�hY����7yE^�W���f������h5�YdYEV�����������$���Hd*���1������c��f[sub}����>��f���P�~��U�������4w�����@V�Ud�'���E�X�����^-�m�y^���5tb���s������D�/�� !��j��e ��)r��&0��F�[�+��7���"��@N�S���=`e���K�-����b�����iEZ��Z>6�cn�� �M]R���@R�T$����������2"���(���� �H+����,�cx�tW����"=0V}s�Ud�������m���M�O�� �-$
|\
�9EN��F�C�Y�K�+���F���r��"�o=�>	�����������AZ�V���
�F��[�+��l}D��@_���"��n����^�MA�f5���$.���"�No�4\�6t�%��1��1\���/yE^�8��sj���?N}�����Al���@R�T$��-�t��C�-�F����j �H)R�:@5��.��(6����#���yE^���6�K�-���6��k5�UdYu~h�}1O����|h��@_r��"�>����x�������|-������E��`���m�����������&��kn������-�����������@U�d1QDuh���=�~k���,4{�[
DED�[�U�4���[�1�7M1p�7AEP}:���w|b�{O��Y��pG�KO<O�����v�(t}�-��B����G�VAEPT�{��L�W�/��(���f*��}uUD�O����m������/���K+�c�/�D<O��{�	�������1�����[
��D<���y�<s��E)��$������"�>T7{���|j��j��z��e �)B��~/b���n2������W!EHRw{FxJ��][J�D��/b���IL�������Y&�e���
R�[��~�Z�!EH���yW���mA������})B��:����M��x_����r�-��ATUD��=��>��e.��C�s����\-b��"�Nn��}�M���$�lZZ�[�KOHR���M�U��w��E�Qm����b �>U*����A#���}�Z�6�K�3������~�)b��:���^a�B�[�+HH�9��C�b ��)b�����J�MS�x[�!�R��"�:+���:���Up���n�����s1TAut���dv�s�-�%E�I�u����"����?3p���n�&�yZ
��@PT?
�wH>�������O�� �"�;��e ��)b���/zt����]Al��:l�V1ELSg7���~��Y�-�5���AXV�����c7���z>���T�25��X/AEPTG7�o�|jw�9���	.zYAEP��i�0���[�-���F������A\W?���R>��S������D�c/B��~��)b��:��������]AjHS���o�j �*�����������EJ�����:�*���:��{�D�A=���h�)�����@TUD�������?��B�QMf��M��@LS���^���K�-���S5/�~�����0`
�/��Q>��2�+�/�� �4��2��)b��:���PMS�vW�hc�4S���"��������}������[�)�LQETUG7V��(��]A�g�t��VAEPTG�~�g�@���mQf�6������ �+�����lFG��E%�c�KNPT?
�wR>����<��BT��\�j ��("���/���u/�v[�����w_�1EL���y���>�\�]Q�Ho1=�
���"���[?
Kt�������uG��AEP��E��G>o�� �CZ�f��^AEPT�7~9-�%
�xW��M��D��/����i\�������y�/��v[m�Nt�o1TAuv��u:x���n
�5�=��T���"����tW���K�-J�h���\�qE\��o\N�;��y���R��g/QETU���C}	������6dNp�o5UDQ�	=�$|��-��G�����@\�!����?���]�p����h�����qF���L�����b�W/2��"�N����'�@���
V�lxJ}5�Qdup�W1��������f���� �H��'�����J���vW�l!:&�On1�R�)un��}x/t����R���J���WIER�T'w}f��9�X�%�%b6]�'��@R�T?J�w�N>����v�K�+HLo�`����P$	uj�&����v_�l.���r5�Qdup���e������f
%��JE�8,����xR��i�4��S�-HD+�4��b �H)R���/b�5�S�����7��Z�dEF���i����.��(!���~@_$I���z����[�^����dh+�(8�/)EJ�R7}}tE/t�����K�2���j �H)R���/f�7�_�}QB[I:x��7YEV�U��}��zj�1i]<���e ��)r����z��n�� ����H�.��@N�S��������x�z��E�����T�d���J�����?h
���/�������bv��4���"����{����R�[�+���z�j ��)r������	������<V��0p��7iEZ�Vgw����n��u��uX���"�H���?5�Y��n����j��	H����"�N���*\�c�_�mQ\��g�C�� �����;�(�����(��}Az���E~/=)EJ�R'�!�S������:1&��s1�S�9ux���]\��n��M���N�-���"�����E����nc�k����IER�Tgw��<�yk���fz��j �H*����OC�:�<�o�bm�!`���A^�W?����V>���(������������H*���:�|�����k�������"�>�����+�x_8�
sS�1ou�W�yu~X���O�� �g��*���_���"�����t�oO�� ��P��.���"�>�
�0E�o��(��K7���o���5tb�x��\�������1r���H&��|��a��w�����p.F���;�@�%~�� ��f�D7.\�� ���	v����"������0�l�s��-
2�W�e��7��@N�S���=`�uG����eQFFK�}�O����"�H��[�"�����(������#7=EF�Qg�]#����oqY���Xj�f��AZ�V������\�`��-.�2]��p"�/yE^�u^=�������D���,��njm�X���1�Qdur���}��`.mQ���7Q�����"�����?�\	���UQ�5��4}�?�� �H+�����{w�T��� ��,]�6�H*���:�ts7x���,��-b��o����"�H���������n�[\Eu5��>i�������z�y����.|����At�i�L�IER�T���c�7���� 1�.
t�f �H*��z������Q�����c�� ��+���&p��|�����eM���B�	��*���:��p�N���*�����X���)EJ}B8fL���[\�$^3�$���w���������������������x`��0��[��V���laS�i�?z��"���Bs�[[D���M�"��"��-���X��K\�E�Mq?;�rT������o�
nL��eAr6�H?oB��"����z����oqY�5�yF��9*���:��*=�����eQ<���k��>7QET�MT=��>���/����eAt�������"�D�w_��!���K[$���^8��'��b�c��������3�/qY��tJ$:ntsT\9�I�����g�����/mY�����|nB�����:Ro��2��1��*��l3R'z�����z��!����_��wE?L}����lbj��� ��vF�3��IGw"��� 3�����1R�!un��5}���eA����N��@HR���M_/L��u����C�@�wnb��"����2&��K[�V�!a���)b��:��s����VY�m����mb��"�N�����nq]���^�}wV�.
~�������}i������	kS���~���"������0I���KY���47��z7QDuv���:�yS��,J�������owV�aut�7r���b�/mY�%-��MOLS����_��;|l�WE����zG������"�o��
WC�x���(��/��m}��q��q��������z���-"���t�����z��!��������xC�|i��h��]N��@L=p�SGa���������-.��W�������z��!�N�F��W?O���_�� �r�����f ��j��@���������V��1|��
�W
Au�������17��E�u��O�� ��v��3)���R�����hS�������"��������|��UAlX����MOLS����_X���%.���Z�@7Tm���"��o���K[����nR�
DQET���;zY��-
�k�+Fb��vQETU��.GO����(����%�-����0`��0^���s\��0�����V�5�������"������5�c���,�Z�!�@��@LS���
`F�h�W�[\e�h� t�� �+����o�P���[[�u9�	p��n �*�������Su�����h�n~6��aEX��
����-.�2���������q��q�������t���omY�f9%�4���1u��|�S�����,�ksT��n�����1u����Y]�	�/q�a���Mp�/a���CX�6�����$��
��*�j��U�����0P����m���K[d��B4#�n ��j��@��@���	���,�X���>U�����a����w��Tuk���j6������"�������_�� �T�{��
AEP��y�7T]��(�bM��Tm���"��o�����eA,�/�����*���:�	i���K[D�M�Q��*������r
t�-.�2_���J�"��A`=X�����U-���?�����V�)x)�X��[����m��|"�Nm�b�D���� ��3j���?��@F�Qd���_z�	^P�-��b�B���=�����xR���Ip���-�:3�*�I7�R�)un����	��u����J�f��6��AR�T$��M�9���[\%gK�.�~9�*���������M��w���K[�p7�h�R:�a3�Q\3d�9�z��������eA<��k7�n2��k��:�Q�n��m	<���E��fSgG�6I���CR5�������Y���*H��dh&xg�f ���jH��(�����}���EAlE������@J=q��RQ���}����J��uQ^o�Ku�T���z��9�U�<�|p��f��/mU�>��k�i~�)EJ���u5����Vy'�D1J�R��"���t
���[\W=�l�1�c3�dYEV����@"_�2R�lk��`P�1�S�9uv�g&����*�m�5z�����"����[?�.��oqY������[��AZ}��(�+X���#����8�E���	o�"�.��@N�S����_hwt���
bC�H�����"�H�������y��x��9T���"�H������>��~k��Xxo�;8it3�T$Iuv��5��v�WEq��"�@6wiEZ�V�w�ch����E��fk*x��/yE^�u^=�7?��a�����,DH�c����d��u��=�S|���eA���L�8�n ��f���8�����;����b�����i~$�W
I�������{���pi���jcMC�(o������:�T�}��a���V����k������z��!�#������fCd.qY��F�����y���s8��y^���O�'�/eU�����X�^I�c �H)R���O%|���U���6��|�$IER���Y(���[\eZ���0��A^�W�����x���Y���w�n���"���=���KY�W67_h�?zR��"�>�9����EYn-]���w��i����%�����6�iS�	�_�� ������GOJ�R���
`�F����� "m-�{��
�9�q��������+���H��Jz��	��{�=�k�]����������xJ�"���7��Bo��eQT���D��'�H*�����r����UAlJ�	�E�
$IER�����p��)-���-���&vIER�T�wf}(v1�%���,"���p�Ud��������~�{>1P��� )m���[H_r�
�3��Y�������~��UAde�21p�r�
�3��Y������
���%������RE{��AR�������s�?	���.mU�������H�w���T���O��T����-��so+D������5$�a�����_�@�(��eQ�G��\�&��A^��w�p^�������uxM�S[D������$IER������� 2[z�	o��6�T$I��O���-.���-������yE^��>�P����-�$���_S�m ��*���.PfG^�� }�f	�}�*�����6��'�X��E�h�?��O,���_����o�������&������;�,�[[����eAo�^���Puh���D�U���*H�j�{�|��'��x:�����M
��,��y�����;*���Aup�g�1�[�omY������s�n �)B���OtX������S���&�>jwR�!up�g�C���5���(�G��s`��?DQ���z����{����-2����x+�f ��("��~/4�V��V�5Z��9`����"���s;>OY
��]��(���X=��;*���Aup�'a	s��������K�B��"������D/s��U�|�����9�
�!EH��I�tEv���(���,�>����"�~U�9n|p��x����|j��<K��?��!EH��u��]�� 2e���MS���"�����X�����,��d�B�L�AEP���%���K[dF��������"�����f��[[dJ���] �2S�1ux���FxJ�b��)c�i^���@%]��]?h����ox������
b����nb��"���R����
����9�G�MOHR���]���~���uQ�c�x9��|sV�aut�g��n��,�������)B��:����<��%.��#Z�tp��aEXV��~f���\B~�����8�������"�~W�8�|n��=�E��eA�6�a�����1EL����T�T���
�k5����n ��)b����#F�K.q]����ptU�� �+����O|J���nmY�>[�'8F�*���:���� �nmU�|���#�n �*����O�|u����EIm����3?�q���z����?��_V=�eAG����i�
AEP����a	^8zk��L����_6AEPT�7������[\E�h����rW�qu|�����eA��YH���@TUD��-�Xv]�� 2g����
DQET}@�9�3_��(sD{�Jp8��A`}��,����,����$��]�� ���	���@LS���`�	6s�TV�p[-lE���("��:�����w����h]��9+���:���]�5U��.H�ej��AvAEPTG7b}x��}�����S�\#@V�pV�aux�g6�\�I�/qY����No{�� ������;�,����u��>�U�x+�-75t���@LS����_<.u��>�UA<��L1pq3S�1uv��6Go�3�-.����m�A'j6aEXVG�2&:�|i��Lo��G�w����"����������UA$���w����!EH��=�M������uQb5>��kwW��o��=��n�4�W���.�7Y��AEP����.x����
"�ZW���{7TAu|���~���uQ"�gf�y��qE\���>PZ]�� #b(���e ��*����C����VI}\����Z���"����M������,��h��C�h����zC`��o���U_��>�Y�y}�Y�s+��+���K[��tG����P$	uj��������UA4F��
����(2��:��1B�s���,�4m},�%�p�T$�������B���(��� f���.����@J�R���e��
�o���eQ�F�����;H*���:��3
K_Q}����f�"c��GwYEV�*��q>�������TOmY��n<��(2��:��
�����
�<��}�[�v�D>�O7}a?F]��(9��&z[�� �H��'��M����M������-#�����2�R�)un��������
"�Z��M�2�R�)ur���[�=Y��.J����}/YEV�*��s����O�(������G�z����"�H���>��^�pi��H���fw�v����"�H�����}�~����X[�d�{�7YEV�Ug7~]|�~k����k�D_���9EN���y�Huk�:�-������6�S�9u|�g}�$_��(>�e���Qvi�i����[��4�|n��+���/mU��r�D&lr��"�Nn�Rc`�|*�B���}A��I(��:�������-.�2t�.�������"�H��{?�1\�wk��������F_���"��n�DG�K�nqU�t��+�O�� �H+�����l��$������������A^�W���w�R>��������UAtZ�n��4/9EN�S'�!��{*�B�4s5��|�(2��:�t[���K\������7�����"������D��f���5�E�ouxH)R��:����`�wk���jS��Q�
$IER���I��#�SZc�l9�������uV�������p�e^��
�m�E����"�H��;?��inmQ�>�lKU���H*������oLi��
��,���zO������"���h�d���\�� ����r�[OR�T$����2t;���
2�h���@��*������O�B�~>�e�_��z���E�/i�i���{?gF���ox_�:�K[D��)c����9EN�����@�(��� ������
���"�H������:�D��E:Z���!��iEZ���Y�)8�|k���2T���n ��I�?������%���H(���,���c��c�Tu����6����}s����"�"���T��O�:�����,�h�.=|���iEZ���������KZCTZ���}���AV�U����xV�����<����o�d::��e ��)r��������M���*H_���2p�f ��)r����e�
�#.�2s5�s&z�os�V�iuv(>d�sU��nR}������H*���:�tWS�T��*��k��i������"�H���?q�������*����e�?������v^����Gw�
�zi�:@i)K��
���"�H��[@����[@_�� ��$�;�f3�T$I�=���>����(��,{p��� ��+���&��momU�pm.3P�o���"���m�����
�����};�
�9EN�Eh>��������SW�M��x{ba�#�����?�������������E����#��V�9'��������2��
���"���C���k�/�G[Dg�f�<��("��:����������,Jvi#M��z�8*���Aup��>5�{�mY����;�{lz"��"��-�d���M���X3�����APT��-�����z�qY���I�����ATU����x4����l��h=�� ��fz&y3QDuh���s����*���V��|~�	(��:������H�#.��d��})��}sT������O<����/mY�a�2�I��@HR���-��4t`��V��+��H�8�f �)B����z"	�J��E����,�7�A�����G�On�<���3Yx���>���0�e �)B���o�8��h��h����� ���"���{>����E��f��p�/QETUG7}C'|��K[D�U��O1ELSGw}��|�p��-
�f�������@LS���m�u�(<:u�����f:;������`%}�%uot����/#��wk���FK�����@LS����_�L�<mU������b��"��n�r�t���EQ�6F.G�vaEXVG�~j�S��-��-M�%����"��n�d����?��(�������� �+�����Z�����--�>�\=�X�;�*���Q�&���Y��.H��eA�_�����=!EHR7����F�UA��cb���v1ELSg7���7�G\��cyx���rV�aut�'�� ��uA��1�����@PT����G��.{��eA���K|�AEP����<�����,k��X��� �������<�|r��Rx�eA|�Q�K����"�����?5A7S=���oHsI
p�f �*�����d���K\E����@��6qE\W�7��EU/e]k.h���'�)B���/:��E���!��>����@LS��t������,�Tkct����A`���,���}(����K�jm]�h=R;��j3S�1up�ja���G[�����wS�b��"��������=��(KW���~���aEX���Kp������wX��*���:������y�UQl�43���;+���:���
_��!.�r}����E�� ������O<�|n�w�D�>X��� 1�g��+�n ��)b��������*�xo>����n ��)b����l���h}���x�L�DywV�aut�'���S���B��������*���:������_�mQO�k1��KH���1EL�������nqYuk�wt�� �������<�|r���5/mY��[F����OAEPTg�b�����V������m���"�������.Tx�uQ��)��<8����"����@Y�V��UA���f�����@TUD�����V��(�������o3UDQ�=��N�F�G\E���a}����?��o����v0��y����?����`�>PL��� ���\���(2��:�����T=�� 9V��1���_2��"�.�2����G\e���B�7��AR�ToO��;?�X�������k/^��o7�R�)un�'�&z]�KZ�s5������)EJ�����5�al��E��u
M��jw�Ud����'�N>��31���_�� ��f.8����"�H�S{>	C<omU��V[aC�4_2��"�����I�%���2]&�2os�R���S���O�����_�� �Msi�K���)EJ����Kt����:��t4�9�h�H)R��:���%�%��EY�
����� ���oe��<{|t�������/�M����/)EJ�R7}C��[�mU�7����
x9�n �H)R����f��9<���O�7�.�T�� ��*����O$��nmY�7��Z~K�
�9EN���tA�<omU�1FK��r3�S�9u|�g���eQ��i�Rt=�� ���V��V��F'����2|���-rag��C����"�������.��R�PWk��4l�j7�Qdux��>�������(���g��|9H+���:��SW��p��-���I�
$IER�����]���G\%ux�n�yw�V�iuz��;<��!.�2\Z����/����v^��s��6�6���gnme��]H��@N�S���
�����eAz6����4�S�9uxh�����nqU��������Gq�V�iuv(~���'�^�� ���,tT��@R�T$��-���/eU�����4p��f �H)R���O�mM����E��&}.G�6yE^};�~�������(�nmU��������zr��"�����S���[[$�Y������"����1�u���,J�j#b|�����"����;@��Q��,�j��%��Y����"��o}������*H��������@V�Ud�[t�>�����.PV���|��w����������s{�XC��U��*HJok��F��)r��:���p���
b�M�L�����"����[��������(�Gs
A_�viEZ�Vgw�]�C���(�-�f�W�i6IER�Tg7�2�s��U��*�����<����"�H��;@�1�B�?��(*���?�����y�������%���H(����c�������s�zJs������6�����B�"�>�W?q~���|�z+�B��������@F�Qd��
`�:�Q68l69��n ��(2����}�w����()��L<��;H+���:��_��aY��}_&���D��)EJ�Rg7����0P��� ��E_����c �H*����O�Vvx�%�����M��*�� ���o����U>��[>Y��� �-��]k7�T$Iuz���S��UArI��\��1�T$Iu~}zR��E�6#�	ps�W�y�-����^�� ��E:
��@V�Ud��]�������(�O�m���}��
dYEV}F����E�EH���s|<��Z��?��'���������
n7+�;����������e!��5rL�H�f ��'����oht)��� 6�iZL�k�n ��("���/G�)�%.�����Z��|A���:�����
;~k����S&����DED[���vGO$���(�n���������"���>3�Do���E��M|��+�wQET}'�~�h��
��4���[[$��i^���("��:���i���[[5�7t���'�f3QDun��1��R��,�Jk�4\��;*���Aup�'���i�K[�g�.}�K�!EHR��|1t������l�2j�QDup�'k
���WE���m�~sUD�w��g������r��V������oB��"�n�\
<}~k����62���)B��:���9��oq]����x���{�_���"��n��O��K[DZ��1ELS'w}�����omUsm�kx��f ��)b�����/����U�=WS-���AX}��+�������m�rF��om]��l���'��@LS�����p������EA,Wo��������"����[��3��oiY���EQ���qTAut�gc	z:��V�����.�|AEP�����D/���UQ�D������ �+�����l�D��������M17t��� ������OM>���!	Pk���k�1�w�p3S�1up��)�V���*�m����X���"�����?O�����,�Oo��:��9+���:����bat��[[�����.�zAEP���U�c/\��*��Rdv�k7TAux�'�����/q]�64b�@wqE\}7�~�����_E��\�� &9�$�[OLS�����Y����U!�{S7Q����@HR������C��ToqY���7K��^��9�+���:���Pt���V����A��6QETU���
||)��?�����������"���h�F��o���eQ�g���G�1W��)�w���s���m�r)�+�B��+
�al"��"�n��/A����� ����o��
�1EL������xI�-.�b�_����owV�aut�g1z���-
�9��|m���l���"�����T����[\E<G�a=�O����"����?���_����E��6-��yqE\}7�~�������DT�,�N���Wbs��x���"������0Kk���K[7z�-�wp���'�)B����S���[\�{�����>aEXVG7��@��_�� ����>�e �*����/B�@_�.mU����g������"���[?�^��p���]��x���"��U?s8����fG_�.mY�5��;��a7TAuv��3��~k���4�'����@PT����X���������mN�������"��������n'��UAL��nK��U���"�����}�
���
b�MdvCo|DQET}B(�����E��[����7���������V�����Q�w�'N)�[����U��eA�����n ��(2���o��.Q��EA����+����(2��:���0���%������:X������xR�����`�wk����"F�(���(2��:���t����.J�&:������"�H���>���O�WE��g{]���y$I��������6}.n
|���uA�������@F�Qd��M_��lX�[[d���;�MOB�P$��=��ho�@�%.��b�x�B7IER}<���$d��e��>m�3=������"�H�s[���M���(H���W���w)EJ�R'7|>��/i]��mF�@W6lr���VN������H��{i�����pG��>R��"����r����UA�j7��l�n �H)R����=�~������6c
t9�� ��*�����������
2�6�����1�S�9ut������[[�u���|��9EN���9~1�[\�}����+8�� �>�V�|��A3���~�Zv����i����j3�S�9ur�7dJ�o��*��h��|���9EN������=�C�%�������� �H+�����,���������M�
$IER�����{Q�%����?t���*���:�4{���?X��eQ���$������ ���o��O�R>�t��>X]�� ����f ��)r��0��4p���-��i�$�E|�9EN��/	����EIk=,��ss�V�iuv(�s�.me�X9]��H*���:��@+�[[dMmk��KvIER�T�7��=��U_��������������"���W?sn��P���������t���~��)EJ���N�K��-�1��q�1�T$I����+�/qY��F�B�U=���"���}��P:��UA�f[�~}dYEV����7�_�� j�Y�e�����"�����{O|�[\E�5���_��$��k��~O^���\>�����qi��i�[�
����)r��:�n�6�K[D�j�b
4�c ��)r��0��������*���x�jw�V�iuvhC����,D��!��,��)EJ����z���--��k6�1�^gw�T$Iuz������=���(�u��l��ts�W�����'N-����*|m��-"��Mt��c ��)r���/t�B�^�� �����Mn$��oDd&��<��a������J#j�dv�#01B�Su3y�k�8
]�KOJ�R�����gW�����mQ�G�R���� �H+����O#F�g���S�^�D��i>
$IER������ ��� ���Y8N�H*���:�����	o�����h	?X-����v^��y��@�B���}Af�H��_���"�N�M�_�MAtV�Lww�,���"�����c�-����d�H��jq�W�yu~����{�����[�x��@YEV�U�w�)Y���[�)H\�;�S{�\
dYEV�Eh:�6�%��_��zT��I��{ ��/J�������������obcs����8�|n
X5=�~kw6[Yux
����"���C������{���HX�i��Z���"���s������w��x_�����.�\A���:���)�U��v[��m��w.B��"������L���G�-Ji#���/_AEP��uWU{���(n���Mp�� ����D�OP>��s��~�������1<v��("��:���.���G�-�UZ����@DQD��=�W��3�x[�����k�~���"��T�|��Nx!���������:��@HR���=_T�k�G�+�u��DvW�,B��"���4��e���-�eF����Z
A�����c�w|���~/�� �$�+�b��@HR���-z��K�)�M�����(��:��Co��o��*M���?��ATUD��-_*����n���s`���@LS���=_^�5������T��h3%��o1S�1ux���/D����Dov��CW�/���`�b��V�����m�23�E.�vW�����&x�t1S�1up����.~�������g������"����[�*wG���(Y�r����?�aEX��uo�����Hi��#�*���:��S��^���wE�0k!�`���AXV����_�
�A�%��M����;�/����nT����s��n�
���
�g���:xu1S�1up�.�w�G�+���|�u��b ��)b��������C�/��6g��[FaEXVGw��������^�;�YAEP��E�0�������Z����:����"���4|����o��(!�������@XV�
��9�|n��&��O�}>�h#��a��@HR�����t�����5����7.���"������O
�����EI�f�������A\W�������N)��}A�U�)�M4����"����|�`�hw��M'ZE>jB��"����s�
��y��E	����(����5tb����3�|n�W�/S����dD��%���O1ELS��3�#
/���@a���>|���"����{��E{�G�/�TS�b����AXV����_O�o$}������Lr$�}t1TAut��Vc�[���(�D�0�QsuV�aux���� ��n��i-�wp��b ����OP>����B�'��MA���2"�=�i ��)b���/l��R=�mA�h��
<���)b��:�����;�G�/J���9��� �+����O����T�v[�SG�i~*���:���,T�vW���l�Mp��b �*����O#��������gs�wqW��w��g�(������l�-];z��b �)B�������?�]A���)�R�EOLS��;t���G�-�u�@<����*���:��3Sx��K�-�{3�R8�/QETU�w�^�~)7����<�=��	)B��z��/��5
�mQ��MFVG��/�������������%���?�'����*L�3��v[�.�U�s��(2��:��K�>�/��=�����i5�O��tp�WC��>o��(6��yut{�� �H��'��M_�cb��_�M!4go>J�u[
$	EB�[����7t��-�E����p�_r��"�N��z�^
���o����s`?�/����VV��9�s;>�	�I���������XdEF����Mx����6^X���_��2��"������pI���2��N���)EJ�=������O�����HU��}�H)R��:�����}�_�m�|~-N�$�j �H(���O������t[�!���o��t�S���r�g���y%:j|k�����0S���"�H���������n2�M��'`5�R�)uv�����>���m=��j�����/���"��n�������D^���o5�S�9ut�1����n��U&��e5�S�9u|�'��s��E��V����i�n�R1�s+�F3���~�#�s��vW��l:+l!9EN�S'7i&�a�|���h��t��f1�S�9ux�W�Y��[�-�Yo9F&��}q�V�iuv��=l��z���T�^��\$IER�����	v)�t[�l�}�L�:H*���:���&x��C�-J�7� ��X�� ���o��O�Q>�t��������\����D�?
�9EN����������D���v�B�EOJ�R���������J�%�E������?
dYEV�������|)��Hka��]~���"�H����Kx����D��J���>?
$IER������}���wE��dH7t��� ���o����V>����^>�h��j�����@R�T$������w���mAj�0w������"�������C�-J�6��E�yE^�W���#���nr�������@V�Ud��]��vp_���$���T�`V���"����h��V/��#GQ�h�� ���VC�����}f����'�]��n��y��a>��)EJ��f��I�"<�]A�f�����V9EN�S�7���'�J�o�=ZI$�����"����z���@�� ���C���H*���:��S������x[�l���j1�UdYuz��m^k��-��xS�.�����/����v^����s@���n�� �m��$�4�S�9ur���9�[�kpd��Y���r��"�����X�o�R#[i�J�t�V�i����������%���H�I�c������1Y�UW���*
���*��)�������xZ�^Q������6�E�����"�H��;�������6��^>�i����"�H��@��H����x[���f)<	fu�W�����g�X>����-�[�-��fV�������"�H��[@It��������S�Z�H*���z�p��
����u�/��� ��*���P�\���E�����&��e ��*���0�&xb���
2F5�9::	�e ��*��-�@������R�Gu�isu�=��_9������������g��oa��k����xm��
���(�n�� ������@DQD���_Y�����G�+H�5�>�����"���sK��R�S�-Ji6�����rT�������c�_��� 3f��
�����"��c�>����tW�3[���i!EHRw|�Q=�y��x_���9:��_DQ���������Bu��^n�� �����_"��"���R+D���$�h��p��b ��("���/b��g���]Q>V��5��y���"��Tw|#
~���n��TS��2R�!ul�����z����J+�~�Z
�!EH��iZ;�O��(���h�8�� ����D���t|r��}���H�1�4�B��"����LB<9�������Ve����DED��������}=��QR����ATUD��M��'>����h3����SOHR���M�gL=�]A�[���h����"�����>�Y��G�-��l�2]��8�7����M������m_������n����Z �1ELS�}����J�vS�6����,b��"�����Tp��-�C�
��k�� �*�����kpz������2,�aS���"���?������O��(}�7I������"����s:���n��12�z���:�*���Q��#���E*:$���
"i�rT�i^b��"����sH��wkw���c&xy5S�1uv�5�AR}�wE�b�\r��<aEXVG�~����~kw��k��_���"����2kv�d��������
�j �*����O�z
<�������_���<_����n\������|�!���p����QDuv�g����R�
�������>��@HR����_�|�[���S���'��puW�qu|�����G�V��U�@�&/���"�N��*������ �{�.h����"���wh���`u�7F�V)&�7��A`����g�.��\R>���Q���[�/�4�����z�)b��:��+��w�G�+HT�9�=!EHRg����S��#�eTC|��>WaEXVG�~�:�G��� U���`�Z
AEP���y�D?R=�mQfh�k�)�"guV�aux�w-�G���(C2[H8:N}uW��w��'^U>��C�)��mA,[�pt�����"����������� a�����}1S�1uv������G�-J�lif�Q�8+���:����Yuk���<���T_zb��"����2������
�{o��L4��@PT����f����-�%�F�6u9:�}qW��w��g^S>�����7�[�/�h�G9:��e �*������@P=�MAt�lc�pp�b �*����O��!U��mQLg�O�\�qE\�Z8:����b��e�i>
DQET����@[�[�)H��-�����YDQET�A����h�-�w�JF�n���U_���������O|��c�����]�s+��=�YU�}Af�^��A�/EF�Q����2��n
bcD�h��n5�Qdup�����)��x[�Hi):<��:H*���Iup��%	�������4+��}�_
�)EJ�[����(�>�������U��	���"��������ya?�[�-����Lt��� ���o��O��|n��������bT��k���^��|"�N�������������0���b ��(2���/2r�����()�$c
:�}q�T$������OS{���h����2��H)R��:�����8��Y�vS���MtL���j �H)R���O3"{�}>�mQJ���v	�y9�*��[Y�3���I
t����dD�sxa�w���"�H���>�D�gfnz�K-��#��j �H(�������R�x_�-2;��_dYEV���I����v�zI�3j��$^r��"����bD��T�vW�(�Z�L�|���R�)u|���������T5��:H�w��
��C~���OG��7��m�z����n�� !M����o1�S�9ur�W� ��� ����v#z5�S�9ux����]����EU-K��o1�UdYuv��1-@T=�mAj��"	��2�T$Iuv��f2����tWu��cx�sVIER�T��n��H�������B�'�_�y�����������{6x������
�.�]
�9EN����fU���]A�E�.���@N�S�����5(�AT}��E	ie&���r�V�iuv�i=�n}2Ge�����)EJ���e�Ht����$,Z�*�st5�T$Iuz�w
�t���o���Ztst��� ���o�����|p��Q��G�/�75U���"�H���PO*<�]A*��u�C�j �H*��
:@����!�%f���_���"��oU�)0�vW�F�L���*���:��	��P�
1�G�n��]�9EN�C�1:�Xu��E�N��R�q��Ab����G�.���[>�����,����Xoc�;:��e ��)r���\��s@?�����6�>��@N�S���`��z��E�]��%�P�8H+���:������G�-�����a�H*���:��S��a�|���x��i�B���AZ�V���-�[Ya?�[�-�Vo�a�}���AV�U����xw��04}������ln	�,]�9EN����9@N=�]A2����=����"������z�O��(f�D{��C�� �H+���PC&�B��-��E�9���/)EJ�Rg������G�+�Ni:k�S`IER��������n,�����`�'�X,�7�����i�j�,6�T�Gb�OU������=��9����[�%"FL�Y|���x�r^}������,�j����Q����4@*�
�:��7�_���r��(>i��
�����tUw����(>/������Ox^�W���6�G������Cy&��|�*�
�:�$*�)�BL������p
��~E�������-�M4����������('�v���������=��^����W���-��|�C�$s��,.{�
� H������X��m
"�x�Y�����( ���/�����~�����X<�8���TU@����X�_�nm[ZcyhqHy7S�0up�Gl���*���(��F.��8����*����OD-��Pu�����`#�����XV���'*�����,�O���+�j��� H��]��(��|h����t�'��P�@
��N��44�z|��E$��S������?2��P�C�$�uu{�}n`
��n��]�85sk��p���~A�
�0L���y.+��=�]Q�����p�`X�V�9�|p�g&�k���mA�G��,�@60LSG7~Y�����
bNcF(W��������;�0��J��mQ�b��d����XV�������^H����1s���*�
�:�����T��+��
�\|�@P��Y��o�{����5G��������������s����������A�[���s���I�*�
�:��s1����m�>����r@
��No������w����_g�%fy���\W�����x������
��g0�d�Y������@��eS���_��(�9�(�xvbwW�pu|(:��8�-��2e��Y�5�
�p�z\}����-�j���>�mA��sj�(���PT��-��.)���� 4��%\\G�*�
�:�	T���E�q[�5B��6��������@2�� �M��P���"}�)`
�:�4���/U��)���P��b��TU@��
 ���Z�+��_��N\<���`X��g/��RVw?�mAx�H.�j3U@P���7�����]��+�5/�U��� H��y�����*�[�WG^'��/����`����{[@�U�����c�U<����*�
�z���
�K������ �:�U�M�f�+��%�U���+�����(>H�����_�����K�{�����hU{���+�8��IQ]��4T@ut��a�W�[�d�M��w@PT�7��R�U=
z�����99���}sW�pux(N��;Lom[����j�vPTU�����R>�~���������
���oE<��K�oq[������;,�����1�s�@%+w�m[�A����f�*���.�������
�"C%��t3T@uz�>'��T�%��
4s��b�;�+�
�:�$c._����2x-�EVm�
�����h����vI���4�/�������{@2K]�MoqW��sY��������3��nE�xj���q�:��������~T�y;�����nmW5�xz��t3U@P����w����� sI�b\�"F,��dwV�`���V�}	d��\��m��� R�j�����W���B�x����+���dTl7@PT/y�z{(K�����(�18����8���D�?�������<^�3�M�?q���0�z��������oo7�P un��,9����m�t�,^a���P ut�1��g?oq_�9�/����f�@*����OB2��nmW�<��k��7=(J�R'W}��rUW*���(����V��n�
���n�D"|/��wEI�9��������^L�O�S>��S�Z�L���
��N�U���@)P
�:��3��&��� �l��������@��[?u-M��EI�!�:?�4�T Hur�w1���m[����'�[J�R��������T{���]A���>yV�<
�8N�����.�n���mQ,��eR^��9@+������y��;�?��A;�{��2\�7�W7R=
�8N����Z�����/�Y�9�4�S�8uz�g��(?�}Q2�������Z�V�����^k}�����mA���Uc=
 HR���IZ���C��u0�*'��
��~AHI��K�Eq"*>i������&��u	����`Qu)�B��uY���f�@)P������4��+���s�T��
����C�y���-n�B��D��y��
��No���F^�N��v�k�1���d7�U`XuzHl������mQ���t�U,Y~8�+�
�:�
N������(9}��,����z�>q���6P��<����C�U��<��8N����"�nmWY9l��5p��
����U�u�B�9�!n�	��F8g��C����+���>�lj�&����V1i�E����V�U`��}��*����C�Db�I�����V�U`��] ���'��(Ky��������@�7�3��ngP��/m_:��Z>
`XV��Eq����
��#�I�o������_�	J�����mQ\rxR�P�n��+��t��Q?���
�8F.��y��Z�V��/he�U��� J��e:����hZ��^p�/���E�+����;�������Y�K�{���m���1�����g�Z���V���@���@�I�;omSQ�aB�x�}7�T Hu|:gV���(�c��������+���6P4���Bomg%_\���*�
�:�$�3�����m1��a..7����*���&PX������E����/�n/� ��b}�L��M��������o>0G.�Y��6�T HuvhSD��U��+SO!��y@*�
�:�
T���R�oq[�t1I��js�W�xuzH6C�Wr=�mAtf�����������@3w/~���]A��������A*�
�:�$������[�7�r��Y��kw�X �����������8�w)�B������N�s�?������49n��� )��g�X`�`����adev���10�n���_S�Me�H�S���{���omU�K4rb��6=IER�T����.�%.����-������"�H�Oh
^e�����������l���"�>���
�b~�uAL�{�������"�>�l1��V��,J����gG������Y���V�������������ob_�,����!�sK��^q)�B�k���m �'����o4GT=�� KV�6�[*6E@PU��~0�=U�N���G[�L�Z	>�����l@��5W��Xz�EAl�&��yc,�f ��("��b�4<�����uQrI�������"���:L��>���Tp���*S���p3QD��!�7!���u������eA2��?����D<����;��R��MV���|'��p:����h�k�G[d�H��n �����M�����fm]���=���6EDQ�6ya�����.HHk#�DED�����n~�eA�d
�~ ���"�~Q�sx���)z����2ry�n"��"����4��Dm]�&n��S��'�(��>/��P��.���
D���:��������.C�p�f �)B��N�|�����<t��Pun�7�u����eAt��	~ ����(H�:8��A���vz��B�nmU_M������oB��"��m�b�����e!��.�"�,���"��m��GN�d��-Kb�����n �)B��^�Yo��yk��O���)b��:��3����y����JXs�:��ATUD��_�u39v�����.��H�n ������o.>���:��yi���i�>��6R�!ul������G[$Mz���v!EHR��|>��b����Nh�e
;��)B��:����62��X��,H�bk*z�s3S�1up�n�5�h������a��
�1EL����������k���|�)b�g1�;�n�|b^�����h���@DQD���00���2eEG"�b��"�Nn�rv|G�K[De�*|���@LS���=�z�s2��,H.Yq��i�
AEP����ppi��,!�_���Y!EHR��|�:z���-�)�Gs�!t3T�i���.�������W_�E��eA���mt{��@HR���=_��:z����.�>��}3R�!ul��<t�C2��*H_SV�61j�B��"����������eAf�������@LS���
�i���}����������ATUD��=_�*�i����Z��"��c�n ������o;>�����9uk���I���mz"��"��m��Eg�BC���>�m�oE@P�6|���pU���2��N�
�!EH��Y4W�
�[[D���L���1EL���;~�����M"=v��n ��)b��v��P������2gw��n �������7>������G[d�ts3p�}3S�1ur����n��V�k������v���"�������]�rk��i���d6=!EHR�7}n���wi��������KOLS���M�&���h����*#rt{��@PT��]_��'�h����������@P�:P����U2�}a����������[
x-��-�KT=���@B�P$��
_��|u�h��L��l�_��@B�P$���^s�	��z�uAB�6|��@B�PN�����
n)�B��5���|�'��|:��3����w_��(�&�Vtl~�_�9EN�����>zpk��4imG�7EF� �~�����^���pT�����'�R�Md�tf��gz��-�.j�
��s7�P$	uf���Z����-�S����Y6	EB}8���,l5x�K[d��t���@F�Qd��}^���y��,�R�������(2��:����F���^�� !-�M�Q���"�~�Q�s���6���YnmY�����/��6�Qdun����z�eA�e�u���n ��(2��^oF
z��K[�%���n�|�(2��uv���~)�Bh.q���[����"���l��G=�� i���@Wo�
�)EJ���P^��bK��|7�P�E(So7S��I�s{�a��3�eA��p�N��H)R��:���1-�����.��������R��"�����{sp����q�aj
���6�R�)ur�����?�� �.�c86���)r��:��3��
|���E�C�r�_�oIER�T'�|1s-�4��-r.���E����"�~�S�q�������;�m]�%�#�9��@J�R���]_�c��9�(��94��ppm�n �H(����s�sZ�������-�,_��"�N���/�����-�\r6�������"����;���
=�yk����7�n��9EN����l��{[nmY����}b��
�9�����3�w|��[�^�� �eu������"����[�������YK�������@N�S���]�!�m��,H^+�u���m ��)r���o��nr��eA�ex��n �H*�����,�=	��,�u5�g_��|���"�H��[?KI�h��DHo�N�m �>�T��Ws	?g����oX�R��,����i
���
�)EJ����tO�&�[[D�d����Qdun������z�uA�d���4_R��"�N�����_��]�����
dEF�����r�3��u1��i�~w)EJ�R'7|����/eY�lb}
��f ���f�o�=>����.�o�[[�����p�o)EJ�R��{}����^�� 9D[��@6)EJ�R��{�-�m�����r�MOF�Qd������Dw���� !�9'����)r��:����r;xw�K[DE�h��{�)r��:�����,���K[�T�#��{�)r��9�;��On���/����)c�8����"����{>��?�� �K�5m�r��"��n��:x����%�fKl7�n �������������,�
D�(IeJJ��,������1�Uw:�`6�EC�������LuE*U�S��>��nA��t������|e��'��(2�z�g�o�{i����|N�h��6�T$IU���c��2�6-�x[*{����'�w}���������������:��������q��oI(z���f���&�m �(�d�g��7?�� �{k8x��i �(�d�7�tk�?��M�F��C��x���"�~6�
wzs��������*QoEDQE+=������i1�����
M�vQDU���~2��omV�����o]?
D�}������]��R�U��Mb��X������"���]����o�|i���h�T���i �(�d�7�����6-�\m�H�����U�������� }�6F�`�w�("��*���,�����
��k�_��e ��("�h�7b8��G����=E�;8�"���>D}��p�6o:�wk���n��O���a ��("�l�7�l�W�G�D|7�n��
s�("��*���F��mV��
Y]�p�("��#�r�����-��M�,����'��("�n���/����yA�i��.�;�!EHUn�,����6�^4�+���@H�(HI��T?h��n���-����6-������_�a �)B�l�g1;�_�Rf��mK���;
E@������>E=�� ��M���o!EHR���9d;|��K��7���b�m ��)b�p�'�����/qb���������ATUDU�����J=�� 3�v��>��@LS���O0���]������s��RM�}��9���"���{��T��������X����@HR�T��o���g�^�� �_G�cv4��@HR�T�����R��0�6-�[�������@LS�T����\\~)�B�hKT�?�4QDU����v�����vm�R�r�)b�{1����������Z^�� �u�����}�)b��������nm^�������O1ELS�[�en�^�� �4�	�9�1EL���t<#sk�����V�8
AEPo������2/�����@w��
�!EH���6x������M�!<
��U��~�K�s���6}��n���[�$�cP�W��@HR�T�����	Gx�yA���)�A��@HR�T��o���mf�%{v������"������� �h�����-8����"����
��)f�}��8/��6c��S����"����=�]�w���G��f�=�'��@LS���O=�����z����iA��T}�i��!EH���4FW�����
�"��T���i �)B�l�7|I4�E��M2���~O!EHR�{>Q����yk�f��7uq���N1ELS�{>����R�6-���G�
�<�1EL��F�����G�D�I����i ������gn��g�.eZ��p{����@DQDT��o�����G��w��{��a ��)b�t��.�Q�G�d�fs���O1ELS�{���^m'AJ4�U5k�����@HR�T�����^�rk���6����BNAEPT�{��;z���i^4u�������~��o�����_2O������u�%C����6-�z�X1�_�a �H(�f�g�C�[�m^�h"s-<�/	EB�P5��9���w�G�D����:������T���2�o�z�YA$�u]~�;
dEFU��������z�yQt6Qp���8�)r�����y5�Y�������[�����"���Q�8]\��S�������A�ZGq~��'��|���i\��g�G�D�]�U���@B�P$T�Fox����^�� ���sSp��a �H�N��������buk���~�V������@F�QdT�N���F��nmV������4�QdU���3@F=�� c6�~�;d�����9�����<�� �����dEF�m����=��6/��m};��@F�QdT�^��X]�����h}o7���4�Qd��gT�fo�5p������b�p�/)EJ�R���>6x���&���������i �H)R�r�g[���mZ�wz�Wi��{_R�gQJ:X_��3U\��[��g9omZ���R�rR��"��6|��Bx�iA�l��M�o)EJ�Ru;�9d|�K�dZS�|{~H)R�����M�����Y!�-/.���6�QdU����c�������h���/���AV�UdU���v�B�#��� >��0��i ���o��'��m���������w����e �H)R�n��{�R��m���O	EB�Pu[��6���6/�n"1���0�R�)U�����u�h��,i1]|��)r�������^��f��
�"�m ��)r�r�7bG73��� 2��-����
�9��������=��������d�����m ��)r�x�g
�5�h����0��n|�)r������������Y���v�i ��)r�x��[�]
/mZ��Zw_����@R�T$U���FG'fnmV��qbM�>��@R�T$U��o,l���L��d�
�9>
����T��~�K�s����}�o����i!��Y������@B�P$T���V��G�d�jj���'��(2�n�7���|�6/�lk��N)EJ�R�;�9������Qk1c��I)R�����I7�^��K�E���7e�G#NYEV�U�>��S�S�B	oCe��NEF}3�>q��n���wt+��Mr����
�/|H)R�����i8X���y!�E�a�,��@B�P$T��o�/�����1Z�5#�i �H(�r�w��	~���iA�z02�c?
�9ENUn�L�t����6-H�����:��)EJU����4�00<�� ��nQps�a ���o��g�n�����G+��J����l��2��"�j�{������	m�b�)�m ��)r�v��B�	�G����k��v���"����]�n��F�G�dF�����C�$IERUo�B��W�6/�n����i �H*������������$9D��X�$���,�2�����dU�-��B�.��x����[�2���w�?�6Kt_�������@�
N*�!����;�������?�Nb!Q���������PC/ixj���&K=��(��:���9e���nmY�a-��=�n �(���o�y6�
�[[$��St��@@P�
�����s&8�����lkY::M�e ��("��JO�{�zJ�b�h*j7Puh�2V�{nmY���n����@DQ���w<^|n�g*	��|h���l#ru��E@��Yf*������F�
:!�e �(��&O#��M���
��hr��g����:����G^��!-]�`+P��x"���C[<���Z��-b�����
DED��i�t�S[�����f ����C�{>�����[[dx[����7EDQ�vy*x�wi���h3���_z��"�����Mp%�����>�B/�{�("��ut��p)�Bd��
�'�
E@���0t��S[�Vo=��`7R�!up�g��-��eA���6���MOD}��+�K�9���m�b����uAz["�t�)B��:�������6�� �Z�X����@HR�����PQ�9��,1��9:���@@P���������6�� }6��O�2S�1up�'�M::0��E��5�Op��� ��*����/����-r]����}�v1EL}/���h���u
���C[$FK�N�y��("��:���������omY�9Z��D����!EH��ix_����-"�������QDup�w��S�e��-��uyx{�f ��)b����e������ s��6��n ��)b��~O��9eY���z���vED}/�������������.�hi�7{��E@��
p7��-b�,|�C�w1ELSG�{	.���� �E��\����"�����������,�z�e>�Y�_���"���\������-�]���_���"�o��p���-��ty�:��G�*ea_�K�9�����b-C7v>�EAf��Lf��}���"��c�>�N<��eAF��b	��nB��"��m���O���[[��p�n �)B���o�\�P����l��n7QDup�'�������E������gsUDQup��ku���[[$W4�5&�a3S���b����Y\���uA�]�.$��lB��"����,=�a�RnmY��RSh`OR���"��c;>
���������p|X�/!EHRw|������u�
W���1���"����{>[�
2��,�������)b��:�������[[d6������@LS����<p|r����=�eAl��eW���"����{�������+����`���)b��:���	��e!r4U[g�2QDuv�g+|�C[dJC��;_���"�No�|�M��-"�T�c�/9!EHR��|*�����.�h+��06!�v�������w��%��(�U�����{s�]��-��dh*x[�f �H(��f��z���-����x��f �H(��Ro��wr��� �Ix_�N��@B�PN��;��b���[[${���J��'�H(��2O�{W���[\%��>�������"���S����
>I��� 3�Msgs�2���FF����s;=Sp���,������7�D:�Ng�y��86��V\k��b?�(w	EB�Pg�y9���,HH��������"�>�P�y��],j���9[�9��e ��(2��F�%k~k����\�����(2��:����e����.�h���/�/=	EB}#�����m��4�C{h��\ES�f7�Qdul���5
`��������T��dEF������nmY����b��/EF}<�N��l������ �|.5���e �H)R��fO�4���*��l:�;:�e �H)R��no�B����� �����f �>�R����K�9'����f�w ��� k55���n �H)R������@�r>�uAf���8�R��"�����v�'q>�uAf������@J�R���-����Y���*D����	���
dEF���I��n<��eQ$���b�7�/���"�Nn���DW�nmY�&K�4_r���fN��i�s�>C�#��� =�.q�6)EJ�R�6}�������MBM0l�R��"��m�4�Ptn�S[�V��H�R���"�H���>��'�����2Z\#Y�0���"�H���>q)uk���ln�=]�2�S�9ur��)�{�����d�\�[�f ���o��{�=>��k8���Ik)"h���)r��:���Lp����
2�Z���e6=)EJ�Rgw}
�H��eA���n�L��@N�S���m_|d�����������"�H���k�$����� �MtEGo
|H*���:������Ia���yG�%lzr��8u��C��K�9'�����\w`��� �gX��f �H)R����g|�S[$Z�������(2��:��*��K��� C����t���"�H����!i�����,�K�k?����9EN���I��n��%���b����AV�Ud��M_t���pk���em�h����9���z�S��v}�1��}OmY�M���tyH)R��:���Lp�CY�7q3��o7�P$	un��:v�����bp��f �H)R���Ol���R��,���#U0j�r��"�Nn�\B������b�wN�r��"�Nn�4��i.Om]i�!��nr���fN�����{�k���[[�������4_r��"��n�l.p���	o���=�e ��)r����:w�6lmY�m�������"����?��������[�bMs�=�e �H*����o	��wk������s�i^���"�No�������,Hf3��p�/�����������?�����?�~������u�S|���-��F���K�f �(�����X����}�/������d�
D���HI�>��%@����n���������{[n��v�D2�H+&k��������"����iw����3����Q��S@8N?N�S=�1�����2�u�]�A�����"����q^-��!�ar����:e�9��:+�)b��J���2�;c�6�HUQ�����Y@DQ���'~�y��V
-��q�{E������_��N�Pw��U�K�Ke�V��{�xN�;-�S8����&��wi��4���ac���q'��Jtus�^]�L��6���5jw0D����Q�>�T�
\d��qF��Z��qD����Q7��usQ[f��,�Z`7_D���LbD=�q8s�����Kf������k�>��D<���f]�����2������]�gEDQi�<�
\W��qF��yMR���* �������y^i��^�0#C��xY~R�!�7�����U[i�Vu�+n��!EH%���
x��Kg�d,3G3�O!�� U��=���Z���zcTGgm��aF������<
)B��J��u/�8��0#��������"������9
xq��aF����f����"�����is�fk���jbC�c�b��"�'|������YS������?TUDQ�8����Qmm�����\
��) ������?��9�����a&����
N=
���*������8(xk��,i��b��g!��CH����|f|-��aF��jS�O�) �wb�d��n�2KE{���(#>������R�;1�T2H���W��?��6�HY�e����{g1��3CL%���?�|Mt����YE���������z��I��g�g���!lm\�7E�*��wS�1�:�4p)�L�UzC��OEDQ�3>Ut�[g�D�����W1ELS�S>�"�2��6��X�V�@D|
*�������
O�|i���)��NF8*���J��M�����
3�.�����'�~�f]����?��8o��V:8
ok���R1��k}R�!�6��>��o��6���x�����G!EHRis>�2��r/e�	�R��z������"���	���N���2����p��Y@LS�T�|��n>���[ge���*������"����	�(�:��ok���!������O1EL}/�������:���3aCJ���!�G���B@���]|sh!lm��:��(������z��!�RA��>>o��{i���������9
���B*�|�{����;k����2�9?
����b*�n��~�����K�����n�
����b*����7u50���8#]FS�������z��I��g�'N��2xA�Kf���1
6�KOHR�T����Bjk��\[a��M2g1ELS�s�^�1x[8^��������"����I_���c��6�����.C��~
*���J��uE_Nmm��)s���_zb��"��g}��W�[f�T���8����?����EpZ���H�������|^���[f�,i�p~�Y@B�P$T����f&�C��6����:8��( �H(*g�gj�@Bmm��.�Lu��, �H�N����iQ���
3bMf:�{����"�����^-�JC��nq��1V3+|qV�S�9�5����^�8U��Yp/�
�'������;o��jSt��[f��>'�r�KO>=���Oy�ts�^_(��6���R�`��?r��q��t�C��[�VE;��6���^�r���Y@B=���P�&H�=�s��Y��0#�����`7_d�����Q����>1/e����K�|��'��wZH�Dt��/o��.^xk���t�	��;�������zf�p�O�����6��K�;�}�GEF�Qy���:����aF����=�, ��(2*q����0#]e�����WEF�xF����k���2R���j���O)EJ�R�S�v����6����0���R��"�2�{��S7_�8#UJ���}�R�gQ�E;������\�I����6��Uqkm�[��R��"��&|}Vx>���)*��\������"�H���i�
����8#K�JQ���Y@J�R�T���T���{[f�
��������"����9_-]��ka�8�JmE�,���Y@R�T$U��o�^�����2�F�j
uw�Qd�73���y��������R���.�j�hG����z�i!�r��.>�����F�����.]8H���R*�n���
��a&j�Y������q�����3�����[���omX�V������R�;1�T2J���������[f�wq�{q�I���R*�n����.��6�H�2z�����S�;3�9��~����)���
3R��>xOv�S�9�;��>���0#u�Xm�[��r��"�r�|\��R��X�G�f����"����I_-E�p���E�k����O�U@R�T$U���JpJ�[eD��f
6�zr��"���}�,t��[f�6inm�3D�
H��E�Yvj.���8����������iMT�����R��"��f}����~���3�b�u�S;�R��"��&}���Lmm��nZ���g)EJ�R��>�Z+��xk����Z�����S�9�9���[�/��8��Z����4�� ��*�*s�����$��
2���8�p��'�H�o��?��Y_3+xg�����3qW�Y�Q@J=���R�(us__�7���3R�����Q@J=���R�(usW����om���
���, �wbH�>��Y_�uL�����YK��a������z��!��q���>_����Q�f���Y@N=���S�8u�O7�S[gd�������r�yg&9����8��
���a&��i��;����"����I�Mpo����r��N��
�)r�������>C��0#��W�2<�)r�����/����
3��t�j�����"���}�78�{i����Z}�3E?$IEReO���gfk��h��V_&�?�T���o=3�������1���=���^��?����y���������22��hk(�2�( �(*e����3��6���%u�:��S@@PT������om���lv�U@@P?P��=��&{[gd��Y� �)�~D�������c�
D�H )Q�>g�&@���#��}m�j*	�tmzU��~�|p�K"�("�{��9�M�<��8-�������-pS�1U����u�}��Mb����M����"����O<�Q���1��=�iA\��)=E@���F\�!����2��>'��@@PT�>��t�����l�{��3�%���p"��N��<�x���f�"����w�("��*��������� M�n���"��"���xf�6��iA�j�8��0QD������+7y�9��iA��1\������"���M��9�	�Om^�����1b�"��"��vy�����6-��kTU78�0QD��#�v��o�.mZ��M-�
�4R�!U��S�B�������:r�e �)B�p����hmZ������=Z�B�� ���X����u�O�D\���sx#<�iA|���
��B��"��v{���.nmZ����781�4R�!U���:
=g~k���h����9��@HR�T�~�������h�b���@LS�T��O�cn��-��"���(:�pUDQU����`�T�6-��-dw8����"���O�b\���{+�0���������i9�!EH�m����~3uk��,ks_	�����"���-��!��Pj����-������"���-���<{~k���k���Z��@LS�T�������0�6/Ho�3=�1ELn�lE����iA�6	pG�KNDQ�E�g^7.���t����M����@��b��"�*w|!��'�[�D�mSpJ�a ��)b�t��{��[��� &�d�w����@LS�T��o�7k�6-��f�#s*������������iA�jc�8�/AEPT�������[�dj��
������Wa*tc�K�=W��v}s��3;�� ��NSp��i �)B�l��K��B��iA��/p�i �)B�l��5|�3;�� ��9��z"��"�
w|����zj��D3�� �1ELS�>�%2���nqV��_t����� ��*��p�7����dnmZ�-V
.�<�1�^L}����-_��
�����}��=�'��QDU����(�B�7��cc�NE@Pe�=����M�����n9�!EHn�t�=�W��6-��-���[|/1ELS�>��&x������X����@LS�T�v�B�1uk��x����i~�1�^L}�U���^��E$v{r���n|�("������=����2-D�m����"��"�*7|�L����6-����x��4S�1U���3\����G_��})B����������{j������V�u����"��w|0�nmZm�2�����@���~����>2���?���_���x��n�7C�#|k��ho�u�C:_��"�j�{���sR�6-H��[7x �4�P$	U�����`�wk�����{W��i �H�/'T�^��T�)����m/��u{���"���������AH�����u�Exu�4�R�)U���
����Z��!��������a ���7���������:���M��ro���"�j�y#�*x���f�{4�X�����|"����m���FOI>�yAzS�������"���P���k���_�?�IA\��^�#�i ��(2�j��{���nm^��4b�w_NEF�QU�<�����yAf�#��?��@F�Qod�g�"������anmR���U��p��a ��(2�n��]zk��x�fx�EF�Qu{=w�p���"-l��NA�2���zFUn�\�B��<�iA�4�}wp�a �H)R�r�7�lnmZ��6������@J�R�T�v�z�K��� 6����v�/)�]�R�����s��n�������M�Ws��`�wH)R��������T��i!T��3��i �H(�n����
O�|h���6V��=X�R��"�*7|]��}�����D��74�/9EN�S�;>����A?��(2V�Xa�t��AV�UdU��o��n�S���b�����8
�9�fN}�}��M�������6-�������B�)EJ�m�Flx���M�g��:,����"����]����nm^is5�}�4�R�)U��S������iA�eYj������"����]����.��������Z���V���"����=�����|h��LYm��M�2�S���9��w�7}f�G��iAD�����@N�S�T��/|�]�C���cu�����"����}��	>O��� Sz[>f`��@N�S�T��O����C�d4>����@R�T$U��o�s�nmZ���o�l�i �H*��z�7��
��� ����������.R�n�Ss	��r��o	�_�����Z��eN	EB�Pu�>_a����
��n+T����'��'��n���u���omV���
��=�z2��"�*7|��D����� a���b�e ��)r�r���|����E��-{��>dYEVUn�������[�D���cc��@N�So��'�?���
�`�w)�BD����[�	EB�Pu��E��|�2�.�������<
$	EB���l�5���Om^���np���'��(2�r��������Y��
����I(�������� �nmZ�1�����r��"�*7{����R��� ���	<?
�9�fN}�-������w�]�� �u�6�~�e ��)r�v��3uk��i����8u�)r��*��-��n�{j��X��������"����]���������:%���2���"�H������5��� 3Z��N�<$IERUo�\}�{j�����Lst~���������?�'�����������/��:�~U?��q��onqt6�S�Dg������0PU���5��nmZ�ev���NE@P%��������l��w��'����T�����O~h���Z[>��9�/EDQE=_}�#:���(s{2�l�1ELSE��)s:�������\���;D�{�o������#�������|��a�������o#��{&V��m���$�>u���6�W��Z��� K��z�f �(��6���>A��� ��7������"��l�l����[[D[��?��@@P�
���<��8��eA���*�\�n ��("��>oHD���omY�>[d��MO@P��M���
��{h��D����b���"�����>����|uk����t�.����"���S������OmY_-�"���^"��"���������eA���cL������"��Qg7z����[[��y.tU�n �)B��N/���X��� ���:�����"��s{�X��[[$2�mQ����*H�x���
�����~�{j���l3���!EH����V���?�EAB4�{$��)B��:��s�tR��,�������1R�!up�����8�� ]Z�����b��"�n�TF��9��eQ<�
K��m���"�n��������*H�����"�b��z/�>q��������/eY_�O�@��E@����CY����7����"��m�lN
C����M���e3R�!up��}%|��S[d�����
�1EL��
uGO���� ����!���f ��)b��n�R��)Q��S���2���2QD�{��c�'7{����-����p��c ��)b��~O������*�Ioic
���1EL�����J��� �-�;�4��)b��:���>�cj���hc���l���"�o�t���omY���"V���m���"�N��r���=�eA�5_3x�t3T_�Dw��_�����8n|n�������,�{��
� �)B��:���r��:��� �y�8�<�B��"��m�������-2f����'��@HR���=��uS1����Y���8{��
�1EL��]_��������(�M�0ty��@PT��-��>�s���,�G��������"����O<>����9�8�C[$���)�J����"��c[�������,���b������"��c>�������m���"�������"����=�<s~k���6�kNp�a3S�1up�7Lm�/�om]i2f$x��f ��)b��n��=�#���.������/1EL�S�9j|p�'�����eA$��f��1S�1ut����k|k��do�����@LS���-�r���{h���l3��x��)b��:����w}��,HH��^�@6AEPTg�}!�����,H��s�o7TAux�7l�;�omY���D�7������_���T������X�o���8o|n�z�j�"<�eAz���\��H(��:�����@��UAd����^��H(��:��s��n��QvG�8������@:�N_N��{=�������Q�my�����
dEF�Z��L�^z����=���979EN�S�6zS��|��*H���9��-���"����O�2>����=������i�-�1j3�P$	uf��3`B��� s�6ft����@B�P$������R������ !�{o�������|:���!k�#0��.H4���m/=	EB�P�6zC�,xH���"mE�~����"�H�S�<�i���[[$Z��@w��d�FF}���m���G��� ��,FG��dEF����0����-r ���1�Qdup�'����V����9{��
d���:��SG�o��m����_���'��'���N�M���-r���������"�H��{���-{mY�i-�.Cw����wQJ�%���3K|n�����S[��13��v���"�H�s���z�}��c=�eA�������H)R��:��s3��]���ML�co�v	EB�P'w{n���w�mUk�{L��
�9EN�����t���-��2�-��@v����"�Nn��\�����mU���/�N�r��z3�>q������NtW�S[D���&:Q�2�R�)un���,�v��� 9���=\�R��"��m�,���/eY��[���06	EB�P'7}:t����|h���h���G�~�9EN���
.���[[�f2��}�/9EN�S'�|���������JGJw9EN��S�9y|p�w���?���,H��M�B{����"���������D��eA�Z����-�
�9EN����
����eAl��6� �2�S�9u|����/eY�.�����n �H)R���/'��=�eAzo�S���GON�S���}���������z���9�$�w�*ua5��{���m��e�OS��*�eo2t6��H)R��:�������?�UA|e�X�����)EJ����
��n�|h��\Kb,�5.���"����[>����W��,��n�[K����@N�S���=�JDGA�K\��u��� ����"�N����$�~��V����)�����@N�So��'� ���u]	O�=�e�}��H�	6����"�H�s���D�9�KY�gKu����"����l�
�uk���k@�v��n �H)R����Z/7����,����/���R�)ur�7L]������ ����Y��@N�S����e������� 3W����m�/9EN��S�9y|p�wue���K[�������w9EN�SG�|9������ 1��zS�f ��)r���/<�~k��\G�tv����)r��:����.���,��6���%����"�N��$:�Luk��Li=V�T���"�H��[�	�����M����������F	��1�������������������&����?q����/l�}Om]o2utt���@@P���H���|j����9�����O��td��&��k[nmY�1������2P�w��N��
=%uk���j}u�(0_"��"�-�Tf|���EIk���.��1EL��M�����OmY�9�����n ����!�G��m����_�nmU]��T5t=��@@P��m^�p���CY����sbg�v�D8����������c�
D�H)��>g�&�`2�������U*	�tmzU��6���{K"�N��kzk��bnmZ������@@P�
��]��,���]�� 6�p��t��@DQDT�.�d:�������\�|�("��*��i|��n����u����?
���><}����-��������<b��aNEDQu{<G�C���BQ�v[�A��@DQDT�6�]��P�6-��mJWt�����"��P������OmZ-l�=�!EH���u��{h��Lo�c(���4R�!U��k�{�om^����o=!�U�����������/�������zOmZ�6�����!EHRe�=[s8���qmc�p�!EHRe��k0�!uk���h�����2R�!U���wt����������8��4S�1U���j���_��(���x.����ATUDU���e\�3�r��Mrmw	��i ����b�/�����(�nm^k#d���KODQDT��o��n����6-H��mwpH�i �)B�l������M��u�c�GL!EHR�[>���G��Y!��&a
�2>
DEDn�Lu;z���M����W�^zB��"�
w{K8
�����T�*���@LS���g^6���M��[�[�D�E�3�/S���"����
������2+���b����:
DED���L���� ����<
�1ELUo�|�[��� ������2TAU���=������h�&{�~*���*���(8
����f���y�/A�U�
���%����u��%��P�[��a��;���@HR�T������>�=�iA�lK��p�!EHRe��!���R�6/�nc������@HR�T��o����	�OmZ���m�B1�2S�1U���>�:z��E	oa�7Z����"�
�|������M����n8>
�1�^L}����=��+���OmZ�=�t����@HR�T��oF���nmZ�9����wS���"���=��w0��L���W�#<�p(��*����c���nmZ�)�����DED��LfLt�S�����#�[|/1ELS��=
����iA�6�P�i ����b�3/Wn��1�eZ���VG�^"��"�Jw{C����QoCl����1EL�n�������"M�+��R�!U���.�>>I���E�2Y>�B��"��w|cwp��M�������e �*��x�'��I�m^�����|������������e��8����~�������~��OmZ��m������@B�P$T�v����:�� �������	EB�P5�����sk��x��8+��'���/�S�No�/�����
�4����>��@>�O�S�:O��MtW�-N�b+��%��=�9ENUm�l�p�j�S�������;
d�FF}����}�	x>�C�DG����4?��"�j�y3l��z�6-Hx!>���@B�P$T�>O=���om^����e����@B�P_N����X�Q�6/�5��AF�2��"���z&K
xpk��h�[�+�/EF�QU=
���W��� ��w���a ���72�3oW��|���y!���1��z/�D>�Ou��
�qpk��HY����1�QdU��}�{j��H[}��7_2���zFo�����2+������F���O��T���k���nmZ��j����/)EJ�R�;=�
����iA�6��'�0���wQJ�b����s��n���9:���M2�E�S�R��"��v{�\]�����6�����2�R�)U���Yv����Mb~�F��)��@J�R�T�~o�Po���� b-��[�=)EJ�R�[>�>
�����c�6�	��t�S�9U����*����6+�losh��4��@N�So��'�1���M��n�{j����b�M���a �H)R�n�7C��L��� �����:�dEF�m��C:~�����n�����r�H)R������]�5�{|�6/�j���an=)EJ�R�[>��� |k�����3�.�0�S�9U����S�oS�6-�F[.W��(2�����;��;��6x��������.�4�S�9U���p����Y��t���r��"�j7}>:xf���Y-�Z��.<
�9EN���v�Ik{�������|[�Y~H)R��*����N���yA�����@R�T$U��������������I�]�
G'^���{\��[�7�}�����
������"�H��m�����nmZ�1����O}���"�H��]�������6-�5d}���O)EJ�R�������[��e5S�};�2�S�9U����d�r��E�&��9�b���8�*���������>��2-��-�����:
d�fF}����M�Tt��L1Fs�k�b	EB�Pu[�6�����1iS{KvH)R�����]�8w�>�[�D{�����c �H)R�r�'����eZ��� �NEF�Q�>��x��f����5��y�)r�����iXxr�C�$���>�r��z3�>��q��O&�����1o���^r��"�j7}s8���5}2[Wp��i ��)r�v�����]{h�����1������"����m�,�T��M��]����rH*�������Pt��S�D{_�u����?
$IERUo�H��2-�[�����f�1|9���%�������������_�H����b�����x��n��;}�{j��x4���f?E@���l��V��6/Ho{�\�I��@@PT�roH����YAv���e��:E@}7�
�zC�w�����
rjb����@DQDT�:O�4A����E����������"���M����������Z�Vw�("�}���+�u��)��I�Omf���r(��*����N@��iA��1c�{�OE@P%�<�����[�D���=1@����n@n�dnG��[�d����@EDQ�h��	����Q�,�
����d���y	�d�?�g��q���h2h��j�=����.K�?��
u�����6+H������Q����"����]�����8�� k7W�u�("�}������m�|
���2�^{+���2QDU���~��6-�G���<
DED�m�f8��[�dk��c��@�("��U������6-HH�2���/���"�����(����M����P8����"�����88�wk����f6�?�4R_)����h��n��T6��=�iA�n�k�z�0R�!U��kwOH��� 2[� �N!EHRe��n]x����b���}8?
�!EH�����c��6-����e��:�1ELn�T��>N��Eq���a
�:����"�
�|co����M������4?b��z/�>q��n��&���I^���U��/E@Pe;>����[��������N!EHRe;>�sF��<�iA|5�n����N!EHR�;>��F�[��f���^�p�)b��*��
]6���nmZ��a�n����"������5����-Om^m��(x-�a ����b�3�7|���eZ�Xm�����c ��("�t��z������l�	�9�1EL�n���{1�%�:��[�c���r�("�����y<��f��M��	��c �*��x��w��[��� �d��1r����"���|�UQ�=|mZ��m��;�Kx��U���h.����m��vG_���� ���*���1R�!U��k���u^�� c���(��'��("�l��M�n��[�$���7�����"���-_�c��M���-����'�)B�p����
���[������is*���*��MY�^�pk��xok[7���4S��{1���u>���(�(�C�6������;�DED���<�\�rk����M�C�4R�!U����d����yAF���B��"�
7|:t
z�S�����������"�����03+�[�D�������i ��)b�p�ga������[�5�U2���"�����6.��I��:%�vu����m8�w�("��*����3|�6/�5�1����!EH����7:>M����f�|�:
�1ELo�Dz#<�iA�7��.�9AEP���4�@P��� m�e���4TAU������=�iA\�i7�zb��0��?���?��`&�2��m��i��������i!b4q�����t"�j�{cY(�����
�������i �H(�f��5�uk��\�`��/�;
$	���*���k�
���c����E��KN6�MdS�"�������-�����u=�y~�9ENU���6Sp����"-$����2��z#�>q��n��=������>�9
$	EB�l�|m'�.eZ�mv
�z��@:�N�S�&��/WG��� s5Y��0���@B�P_N��M�������iAzo]G���i ��(2�j�7d��/�6-��&S��_�EF�QU�<]�a������i�a�4�Qd�������<�]��M�wS��t�a ��(2�l�:;������hs�=����@F�QdT�^o8���������������"���Q����>I��� ����sH)R��*���6x�Pf�z{K�5��9$	EB�n�t��������) �N)�]��������'���-�R�6-��6�6����"�H������^�������
w�R��"���{�|�����"-b�w�R��"�*7|�D7�4uk�����x�J�r��"�*w|*n
<�}���H���w���dYEVUn�F�m��0OmZ��-DE�����"����O�7����lt#����mn�\�~H)R�����yt������2Z���Q�4�R�)U�����M�R���#L1^���"�*7}:������6-H��u;8�p�)r�����
����VOmZ�&�u�}���"����-��mt;����:���3t��GOB�Po&�g�n���{��Mr]�>�t{��@N�S�T��o��G;omZ��rU��N9EN�S�[������"M-$��_r��"��w}k���om^�x(�������"���}j��J��yAV�k+�U�4�T$IU��sA����� ���9b����@R}�B7�Ws	�g��n�7�����M
�D����;�)EJ�m��}�o}OmZ��s�O&�R��"���}��;��_�[����c����"�H��m_7W���[������������"����]��S7���!���wDv�9����"�*�}S�:��wk�������r>
�9�fN}�r���������K�d��c(x}�KOF�QdT���C{x}(�B��V3W_���@B�P$T����o�d�S�$��Q8����"�H��=���4��������NA���)EJUn���v����M��IW7�l�a ��)r�r�g�����6-����:���9EN��S�9y\�����x>�YATzs���w����"����=��~���c�5��}S���"����M��w�<�yAz�1��x���"����]���
;�tk����4�y���9ENUo��(���f�1�����S���"�H��}_������Y�g�-��
���I�=��G����������N��g>��u�����E�
����M�g��[�4?��"�J6|c�n����6-����\|�<E@�����H�6��� �<�
�1�4P�w�p�����P�6-�m�wp��KO@PT�B�������i^�m�
E�Z;DEDm���~�eZYM�x0�4O��������u[<�-������{���4PU����uxe^mkjw��q'��p*�����`�wk���l����a���"��P�<��z��S��W��7���0QDU�����D�P��M2�
���z��"��6x6���[��G��ob���"��������E�_�nm^���������+��������y��1xc0�}LK=�����B�bq�3��iuTf�%h��2QDU���{�����aa�{ksw�����@<O�S�./���U��� c6�mN�k ���oGT�6oz�����_��J�A�4PU��SE�omZ�f�w�����!EH��D���� s�k���"=��WA������E��u����KeZ���'Qx���@@PT�>/�mt������t����Q���"�����i��ao�nmZ�!m����<
�!EH��L�4pW��M����4�_��@LS�T�v���j���[�%���:DQET���,�
{tk�����h��we���"����O-���y�����
�*Mz��{)B��*�����n�����s�h]��#�/=�D<Oe>�����f�-��r��@HR�T��������|j����:��{��@LS�T����M%��R�6-�G�s��qb��"�
�{���[�om^���Z�������"�����5.���:��=�iA�����|�e ��)b�v��<}~k��,����[�{�^b��"�J7}����b�������v�/EDQ�{>��N���� ��Vw�z����"���M_���+�[�d,m]Vto��@PTU���=9uk��h3���W��@P}�V��o�%�����M������� M��
N�B��"���|�t��2�6-��6
��;�DED���L�9x��M�|/
���4R�!U��3u]����6-���bmp��i ��)b�p��e�v|_�C��_/��������"�
�|����2�6-�5.�68�w�	)B���������+xw�C��f�����_E@Pe�=_s��]��� ��}Nt��e �)B�l��c�~���yA�
�=Uz"��"�
�{=������6+��j��n�r�)b��*������E��iA���njp�_1ELS��=]��~�{j���h��8���'���B�3��{=zT��M����_L�b��"�jw|��?�y��5�{���4QDU�����'�[�$��MG����1EL�o�8l|k��,i�|+���AEP�������|OmZ�f}���5TAU���<�~k���hrx�W��@P}������G�Gf`�F��O�4�[�������� �\���C8$	EB�l�bF/B���!��>Bz��l"�jV{�wlt��S�d�!6'��;
$	���*��Y�0��iA�������_z��"���y]t/��qV�6�D�I�r��"���y��|_�C�D�������i ���72�g���y�}�{Z��� r�]�����"�H��m�/�	�1��yAzY�����"�H�����t���M2��m�X�xH(��	U������/����I=����:��2�QdU���2����� }7��<Gz�(2������2����[�D�ls,�e��'�H�7�3g�7z����[����X��:dEF���t��n�|h���j]-���2��"�
�z;�M,Om^���n�v��$	�����������=�i��MU:���4�R�)U���X���[�d_K�y��(2�������gnmZ�f�-�d�w1����_4O\��{��=�iA��G���4�R�)U���i��K�`�_����{�H(�����Y_������^T|H)R�����Y�:`N=�iA�uq����5�S�9U���*\j�#N������C8$IERUn�bwSt���M��u�}���9EN��S�8e\��s���2-DX�+|������"��v|�T�S�e^ocD��8$	EB��������r#�v��a������~H(�����u�S�	�[�dGs�b�y��@N�S�T�~/�z����qm���<u�)r������\K��R�6-��vm������@N�So��gN�����?�yE�j��K�����@N�S�T���v��t>�iA����=�i ��)r�v��wI'��:���{����"����m����mZ�1��vp��k �H*��|��x���M�F[��LuH*���*��-�
��� ���Z���_I�]�Z}c��K�=s�u���m��S�6-��f�5�s����"�H��m_�)�%sk��HS���4?R��"��v}�:���OmZ�9�Xsv���4�R�)U��3�n���L��� >��1�-	���"����]_�6�!�[�e�uq�9���AV�UdU��/��OP��� �f_]���'�H�7S����v}�C��.m��aos�������"�H��]��!��~k������o�:
�)EJ���tL�
�����mo	<��'��(2�r��C|��\��� �7�i
^�u�)r������5TN��N���k��>�#�0�QdU���e����[��v2B�
����"������<���	~s�C�DV4�	^�~�I)R��*��u�UKB��O�f�>�,?2��"�jw|�;zg�S�D��<����r��"���|s��;��� ���<�yH*������
�<�i!4���m_R��"��w}!���|j����H���|���I���������?�����������?��8r\��{���nmZ��[L�\#z�("��*�����
��7��6-�Y3�
�|���"������5<,uk������%�b��@DQ������������M2gG�����("��*[�u�6����E�1�H������"����������eZ����f{�:
�N@}��q�V������[�d��&��u�("��*���2_`�wk��x�C�`/�OEDQE[=C���/eb��>�����x"��O��]
�����Ym�^z��("��*��E�5�+����a������4/!EHRe�<]1�
-OmZ����j���@HR���g�W��"���6-��6����)B��������1��iA����x)B���������nmV���zlst!��PR�������Gr�_E�K��of����Mz0`�-�Y�������.�k�n1������[���aU��Uy2#R	�������89��E�-�*�Y[��� 1��JL
���i���E����J5z��� 1��JL
��1E��/�~F�0s��� 1��0�@���@�hk��)_���m'#
`:��]���
S�����9��t�g�����p/J���*HL%�S'}L���R.�^F���1`X$�S����>&���[���0Sb�6�<$�T	���>��Wx0��(�fEM�P5�{]��JX%����*X������fD�8PN��T	�k��7 ���	#1�h{��E�k��UAb*1��8�����Y���SA���X$�Q����>�F]�y�v3�^�!O?�R	����i*�]����C�6:^$�T	���>%jl0�h��RT�Bpo�� A��JP
����i":��n��3B�U�{
�	���Au����@���&e?Z��h�����TB*!5x�'���}������Xc�v��U�*A5x����7������0���X��JP%����P1��s�v3RkQd��C�*HT%�U��~h5~f���g���x���X��JT%����j~pM��&J�d��}����3�����d&����<n������9�A��8!��9$�S���3?5vn�Y����Z��x�P�UAb*1��8�c�p��Y���b�(x��� 1��JL
��15����h{���Z��S��U�*A5t���L�_~�����J����HX%�VC�}��7N�^&�[i$,Q/���TB���������}2��p�v3"V�@�3T���Tb*15p�'V9�,a�v3BT�[�D��>!��JH
��Qu�`��E��qi���)�uAb*1��:�C�pG�Y��p��m3�*HP%�TC�}J�Tb[fm��}nT�c�����T	���I��7N��~m7#��rk�A�.HP%��
����<p�T�y���fD�0H�`��UA�*A��<���H_����AQ�h�w,HP%�T��~f=����g�1�O�����Tbj��o�3m�r��3R��������DU�j��������@�L!W��DU�j����Q�h�q,����p]���AT�����O��h����m��_�;��
��{�	���L�Ul����S�)�4j����F�y���T*l5���*HF%��Q��|L�\��h��B&,1b��Q��g���3>F
�F-�nFH�4D�=����TR*)5n��@��|�����i�� ��I�$U�j�tO����m7#(�]����
�RI��R�w���	q�����fB�T�x��cA�)��|5�S��;_���`��-��uA2*��5������Z��6���Z��;���d��g�����%��~&��q��UA*	��7����=��e��2�`��!�f]��JJ%��M��*F{K-�nF�"���|*HJ%��J���[<p����'���4+�V1���RI�����^����m7#H�r
��� )��JJ
��1���I����G
<$��RI��>�`'�E����h��*HN%��Sc�|fn�9k�iT�s�����Tr*95v��-�>�h�1,�����uAr��q
��Q�3�i<n�W�ZpE���f���x��W�� 9��JN����	{t}�A��j�N����Tr*95r������}����)L��2���Tr*95v���(��Cm?#Z��Zt��X��JR%�����!�`�v�A��j#��ZW$��TI��S�
M�g�-�nF�I'	�S��TI����w"���	StS��}��42j]��JF%�FN��D���m?#Z��L�� 9��JN���Qu����h����q�)��|;�C����E���b`6�Q��JN%������j������+
��� I��JR�����*%��E���R�j��\��$U��������'}h$���e�
(����T��JR%�F����-�=��M�*ops(HR%��T�'~�9�h{i��b���O��Tr*9������������
Uh��E���dU�j����m7#�D ���U��d�3����k�m7#"��y���� Y��Xe<�v>������F������,E<���S�������X����fD�Tfm�^����Tr*95r���b���m7#�E�bow���Tr*95v��D{����L@A%����uAR*)��;�CP�h����~V
�J8�YU$��VI��S�
�Rp6}�v3"\��Q���$U����������~���E����|�[$��S���S?1��M_�=���S�uAr*9��9��Z1��sR�3!���/�� ��JF��������E���j1/�<��S��������$/�NF���n�n�+}r*9��;�#���s�nF�J�\$��TW'�m�G:�������E�r��P�^VI��TRj������Y���|�(G��;$��TI����P�-5'm?#\�%�4o�ON%��S�g~�j����nF*�L=W�X��JV%�����M���Q���f���8H����dU�*Y5~�G����fm7#lE�L�I��`#�����jx��o�����������tw�����\{|������������������}�8{��4w���i��x�������>,�_��p������������?��?���?po���_nu����>{�6�m�0v'��SG��������o�����_6��#������a*qi}�������o~z�%�����7���Un��p�J���'��;������m2_�:����������=��������G���_�><��u�KP���q�w�/�G?���z�y�g��Oo�7�G'����������{p��v�?�p�����w��;\��^N~���,U�8Mu���4
��|t;����:�P<�Z����z�7���y2���z�D��������{�
r�Xc�Y��7-5��[��L���9k;��*T8x9&q/<�{V��'D��k�^D�>�����?���?������o�k�7��Ir�|����������_�[�<����{�W?��uo��?���/���x����!���I�V�������%����D�7]w/h��}�az������G�u3����^
*�Ji1N�^Z��������FG�I���'_��\?���F�Q������W����wo|y����������!�����I[�/\���C��C�^2��� 
O���mVg����C����E�k�y�����e�g����9{y�wwvn�����3C�ax��!r���.�:��
���9������_8gpZ�����z�n���X���Aw`��cq�=9��~�	6<�{��
���o�X�!4�e���PY/A6��kT�3��� ����������QC����m��]���g�/7{���95��!�So�*�8�����j�{l@P$��I���4 
vy��{��eB�h�����U^�V��j����Zb��w���5�i=�/����x����]�]�fN���>�]�]��������S3Ta>����.��x7{����iX���*�m����_���������?�����+&d
~���k�#���6i�<�����������t� �"ogv����S8������S&j�S�'�.�����]Di�jU���;���wa�2{q��V�O*��I�P�����8�����}~+Qd��Pi�|��V����u�7������9+m�T<�=��+`���m�����Q��,��������������j@����n�|�T�U
���{�>�~	�UPP+�����]������T�����/����2@��_n�>�� ��**����^L���
�������>�Cjt�Q��\�)66d�����ikz���T��$�4Ia�g��uX�������W�]M�y�D�d�A�&F�����yp��������.YL*Y�s����g�1������]z0k�Dj��i	_��9��u����w����dB�f�uW��q<���;���v{U%��Fu��M����[Yf�ug^�9~�.���l���������P\7
�S�>�$����A�����l��/�� ��A��k_���r���h$�q�����n�����������Vo�/�I�E7]=��9�����w�Z��������&[��.;;�L�b����$�e���0I��5e@��p��������������}]oX���_	���I)~�v�����h/`�}�X���dH�$�����/KQ��{������b(�nU�H>j=gO�&�4Q�,�y�j��y��k6w�pm�N>~�-����I4��h]��QJm:aG��d�a�����i��p��h-�03���KfG�h�_i��������&�A��f��Nm����Y��E��l�Qw
e�z��/�_�d��6,���s�3��rX��o��������2�Xd�6/h��}�C3�SO�Q��E���K��jb	������G��Kx/�2��4E��T��{��XJ�1?�T|2�%Wn�m�kgn�D�����;V�_�[�$��������1���g��$����&����s}����]����9�E����i�Qj����1�~����%��3c&���2:��d'��=�%Wr�fiY�Lc�F�g��N���g�������8H�d4�;y�� g��7�n��MF���8���c�g�����L�
�3Y����#r��:�]��}�����J-�-�0;h>
K�[c�9o#���s�\9����Z�&a����C;y��-g���z���<���z���c���&�W����|�<�N�~v��hQh>�H>��,��a��|���%|�����4g���)����]�KL�X�l�D��-����\	vhKMZKV��}2��'�e�Nr���f}g�������}v�7�������E�W�,4�K,T~�)�)�c10��E�h��vm�.��a�:y-���XI'`����}��1.���0V�}�
dmS
�/�������R�����J�T�\�v��_�������E��	����Y��i���L���[�v��q	<�X�Ag���L�6�"�������h�Um1��*YXF�,�W��L�F���:�A>���dJ�O����C�h�6����9<�)}���l���g�hL�<�`:��V��m��dj������G������������Jt��q�=�<�q�vA���XE��Jvl������^��i0�z�G�]�����5�F�Ex'�+Uo�k)���Z�q�,t��0�Z�j��"N���L�X�L��1����f}c�������8t��gB}*����]&V����}�I`>�48��d���A�Y��h��A�E�������{o������.�s����4.��+�7G��4����\)t�Y/���;;�d��7���7Z�,Zq��c���ZtV�W���;�8�����o?�3x�6t*���g4�
_6�1/�H���TL.�(;�s�MR� �X���v�0�u->�0�� �F��o�!|��W"�D�S}3�i��=��	���D>��������X2|j�����Y������jN$����	��?Q6���v�aP��l���"}p��8�
'�V�c!����p�U�������\m�G��c?��%|lf�X�
�C����������Pw	��<6	�������W��������+7�1��'#�Q���m���f��z�Ns�8~��=�������Oxr�^���E��L��s�s���[�~w,��d��Y�ZH��j�+��h��*|lFX�	
�Y�wq�+)o"���$DpE�s��h��7���6Z��hW��e.R�gt�����������[��LKJ��|�����3�vD�:z��0�� L�/�G��1&WKE����Eeg�F�%�2����~���/�8����7��Pp��4nX��������4KU�n<k�NS6����������L��;���_b����V'�Z��l���l�x��^d�6�x�n�H�R�9��Q�{���,�
#mK�`�me&��^lC�}�"W2�&�Y\$���1�I���j��M�g$Y�����X�;Q��Y�=o���z�����(����4x�P{d2u"�E���3�W��
�a��"}0���'W�cs��N�=�%|�7��e<d�G��Rx�d5�i����\�}x�_�<�5��0�-in��N�����6Y��p�kf�!��S��c��>����
H�v!c.��M�D�B�g�����j��(6����N��>�XR�������DJe[^6��h�F_v%�2�z#V�������7�P�"�L��`s��7���6Y����)����f�5�d���6�����K�Y�q�sB���]��xQ��N��u����"{,���{�2����9�������hon���dd�G`�/�d�����@<�Q��#��ZM��,E��h��5w��y3�>��i��I��@��^���'�c�:Y{�������L���F<
f���D��d�@����
������h�@�9����Q�������K����)�����i��a��$�|5ui�t&SS�Ip������j��M�gZ�io�]gHds��7�����=u��q=UfFy�~�����_�A[U2F��%�"�:�������7-S�_Uf]���G��QG�kUh�DV`3��2O#l�t=��>�y+��L����pkJ�|����C.F}������hJ��o��"d�RB}�Z�z���z���S������=�}����\K�(��"|�}��l�%|0���Jtn��KW�L*�Ut������r
��Lt��JeD./�����ZG6Q�Pca�t�I���T�����~�t!S�4��6�>�����������w�#s@��u����z.[���q�N���m��H����i�>���4�
.����j�u&��cNeR����d�Ez��R��&r�(��,F�Y�N�C��Z�j��T���-P�Ix�>��������N&5�Kw����
��:!�{����8��$F�7��S����2vp_�%H������,��aPiS��h����vbr��aG6q���]�U��\_�.S��Y������I"��>v"K��CK�Z�z�������m>9��9%����f���V��#���y���c��������f�<��Lm��}>��jON�k����D��}8yd���M�0�lR��3i�
1���Z�j����BYU�r��vB���<���n.�'��O����M~�Y�9��'�N.�=��)����x�e�K�kn	�G;UV���1������>~�G�����>�q���F���..s���M=!��$����~;E�+G�&�+6�5���������\9z����ytW
isQ�>���.��
�:^�a���n����}�F��m�Mo��Hq�W�������	�����(Y�����%��[isd��e�*,���@!s`���[Ys��_*�LQZ!�����f���&���zRs���>�����h,W5�W��U�7��>��Y���������;e������t�FNz:h�.�:f���I�o������K�9��!��)%Oe��i���ce��d}f�d���S%�<�;�u�9���Z�������=���O�jvs��<��:3
����M�����,�]��S0B/t�h��O�S�d�-B���>�s�x�T�G�jZ4�Dw��}j����,�����DG6�xp��"�����S��Dg���&�3��������c�;��+����7�$5[y���~����7�����7�gW'��������h����_.�On���rqrv~~{qw����������������?���e����]\����������I��+���m���=�{{vu��?�\\������Z���\�n��?������������V�{X;������>���Y)�|p����w��?��������
<?�������7�w?>w����N�]^���,�p�������/�tv��3���:{�����YsR���/>�d�v�7zz��������������b���2�> ��d�4 ���i�W?��_n���O�o���^����������n�v���������/_|{vu�������������O���_������(�������}��?�g������;��������a����~������:��/~��O��o���~�������,k?�|�>�?�������w��g�o����������~�������x����������]|�������Z������O���|qu��%����/��?tn�����|vw_�����wWg?�|q���������~����V������~wq�A�'��jO.���������������������z:��|r�������y}��O���ew�|���ws{�snn�/n_�8�{{q}��>n������A]Wg?��������uk�\]^_���� �j�?������J^�e�=���/�?*W7o�z����B`@N	5F�N��|���������������^?��G��~��\��G��U��������������o_�Mti����7�����~]�;��_���=�WrH���\�<;�����o�����~m�Y�����k?&o��h|���@Gd���^�*�����>�������\*&����g����,��TP�D��9a���h�����C���| ���UC�z�W=u�gGi7���L
�i�H��[���_z����N�;u�^��.��6��1z�n<�=x����Dz��	�W�.�vm��v��������F|�*I�v��������P������c��U��\u�v��t���
y�(��UA��
{�i�E��������h3��e�uqTY�t�vqA�w������!Wfm+�i���l��v��s���;&	�
HE�`	���Z[&��D�����wFz��c:�$u���
��zo�"���Sj�75z%j���	r�	X�R����|��v�-F�3=�=����H��v�&d�D�� )��D
��()��*#}�"�'�����e����cj{Gz&�����[���! $�a��	�� �F|���#+������q��iO�����/����+S�����y��wc"�?���������yJ;�%��3:2��
�.�x��v�9�����-����[G�EB��*�{�|����D�����3�~a`O�2#+z��2�S����E;��"cG���Y�X�~�~�?��\E��Y���rdtZ��;��`Ef��p6��p����hvG�G�e�]cdo�24G���Wst�����rB�Ro�t�~L���3����wo���4�N�&`��s$��>hI�K`FC�A�2�����7�(����"�b���s#�:��"W�.�Dv�
��B:�E��*�6�~�������dyS%�
��D�)�iH���pu$�UY��Db�5���z"31FH�A�������n������	����\�@���8!Q���#}���A�HVV�D���c���!8�C����"s�a�\Oi�;��X�[����j�j��K6��=	�HBr�$#��)H�H\O*��[8pe���4��:
����y�4�9u��c�@���<[te�����u��4�##���QjoH^�4!hT7�V����d���H���h:�V��{]��c�S�h}w�+�C�k�W�N�������������7X���:���2	���d�+����q}
�#25���;�`�{~�5�-�h��G�Y�H��7�5$Vv�;K�8g�r?"�$�1Ch5o���1�sXM���}�5rX�K�j����d���@t�5N@��@^�E;���R����ve��,,�v�
iP�g���H�D�YE�=�@\��x������Wv��1D�Z�~Af��0?`5�	���=��v�Y"�
:��Z"uq�Vs���4����:4?���6��"k�g�9��V��kZR;���iB�oJ\v[X����$b��0p
��%�Hp�;��)������5#��d���b��d|3r`e��	���59��A��H��A��WC#w�F��:Qa��x����`�v�;�C�c���}�u��
��5nH���Di������b��<3�&`vx�U��#��U8_�ZcfNB36$��
��b��8cG����H���{-1�nH��UA�>#��&
���XfB��1n��� ����-#��2�"_�b@��E;pe��)&���
������<mX�H��"��M	�%T��5Sh$��|(�^��L�{+� �E2��B}�B}�"�M�a����*�H��1'n���j���!�0�%�3Y�������L�@6�E;�4�#OS�v�2��#c2w�]�'R{y����6O"\�7`��;�I����wr�zW����xG�It(��w��{Q3��zw ������1��#�R�@f��o������^);R���>��x�h�-@&��x�)����
qD�b#���t�����wQ�Q��H
���D��ip�n�h��ub`�E��(H�Y� �/,����'��H�E;��"'�9�_��8��	B�L
J`P����;���x0�U1����`B����	F�
�T2����i�=#
��������������.H�@����8����6 �/��%X� ����Q�9�vC����8��b�B;��)�)(������d�%CvF�!��!�Ua��do��=����(l��D��#g���^��Y
�sA��
K����	�|X2s��	�Ld��3�}
���-q��-�<����s�N����:��������qxv2V���r�����]��M�8��]���+�Fj��{�������s�v����6���q��1�m���p\����f���AuC-���J;.s�^����@��nH�@�����WI)����
��(�8o�
��aw��t����;�z�K;���#�W)`w�����j��������gw�UPV�H���@uI��#��U�@j~���3dGV8J;���WLUg���p���S��Y�`j��w��X[K;2���b,��L~(5[��Y���"�#�3a8�iiGV!8���#���b(�@��8w����{6�iGV!��,�v����A���p�@���=aL��7RpXh�Tdf#YEM��ZK;te���j�L"+�	D�������g��J�n$f+;�V������!���p|��]�����.�������u�����q�LW��(�g���P���U�h��o4����O#�R�L`��@N�|�FG���'�wD�o��gk8�&��`{&[��2���Q�-�x�l	|�e���D2�q�v��s�$x�������3��S"'�s�Lr�W
\�}iG�b��H��|�&��22������,8���|�&#+��v��H�]�|�g�p�?J;{�h~U����������\ /?� +�Y3kq{]�> _niG���w� ;SS�����q�����GS��U�"�R�}�)\����$q��qsJ;N�"���@vR��B���������W)���*@�e�"�a�@�R���Aw;4��c8*��7�&2��H�IZv��5`�zZ�����p�34�3d�q"�bs���(����������E��{����]0���(�TK�������-,�,��R�h�����c��=����,2*����
�L6�g�7uH2��U��������9���s���N�:��b7��{;*I+V��qz��'���yg_:�1U��4zf�[���[���p�@���\j�uq�y����jqufW�:\�������k�z���h\��Nr���p{�Fy1�fM\��.x���r��`b���#}���y��]p���s�0�)����j���=��r��)���p��I���Q��W1t�r�nj�:��q��l��1���9�7)�E5s��F0����Wm���M��+��U���:ga�8N7ut�����ip��C�<��8���1O��g���X�n:`<c:�7������z�����	ySM9v��7-���
�9�.FzfI�bV`|m����32f�����sv���=t�8�nr����Wt�6�q|�C'=%7��d���AF��f���������^�7���-��Z�P���F��0�o�����sR���t�&=�h�<6����m���������E�uO���-Q+�`7���U�4�Bf���$yg��l-���
��!��7��{�
���j�QI�]�!=�c�rQ|�
'n����j���E���L>��`7����U���N��D�J�C����	���$�-U�\^�p�<
T�t+���0��>����������=o�9w"����������Y�����A����:������������`F���\�����
��ojIs�d^*\���rv�t�n�2P���%�8�ryp�^�����T�X
*Lyu�����P�H������a��81��O������o'y��t���\����pb��T��.��R�i�P,��c\G8�"�zk9�
�&Eoq;���{�w�/F0������q���$���>~��3s�@t���1�����sX��S<t��=��Q~F��4�u�9i��1�n����1�\�4�����Xp�W1�\}<�������41�/��T7A���o���~E�=9��+:��\E��9���ec|�|��s���O2&���B9�L<&�������Wtpe��W1�d���O��e��>N����Tn���	��z���yYd� :s@��:h�1������e���)�l�����<1e\oQL9�T�+SS����O9���u^C�����CcZ�H���E�w;�A��C'��v�,�s���FsT]d�� �-+3
VP{��2�bsL/Y�%�x�zv�����I����N��)2~�O������������~�6�����}��A�p*[�\�O���`7iE��qe��}��*�Vr�����"���Rd9�l�;g����8��N�y��Ki��1����)�\��� !����Ji�-�2���S��:t������\�b��]�N�
�<��~��yT1�.���Opf\�*�5�z�v�UU;R����7)U�����p���j���������jdl��$6w����j�q�<�.\���I;��.q�+y"�9F��s��T'�:�}L��*��������V��_���;�qS_�K$5����~�q���\�:5����M�n��8V>������
�����q_������SS8����^*�]'V�LM�
�����'���9tN)!5���g�_^\7Wj�~{�~8tn�cj.�99�����urp�������� �i%W��i%�i�Z��������z��F����B�y
*p:_�q{��M�=t2V� k�dt�Iz��,:Y�j����|S����8�.]�ge&H&��:IV
&�hi����~�pL���������cB��n\���s�.���:t�#X'k��d5u���CY����]{��3��3��D�N�4!�&Z}T�@W��#
�\�i�c��$A��I�����~:���/oSsZ����v]������ ;�����\>���c25�_��d����lS�{����<1�Ff
�:X5��J�n�)X=4Sn�z��|3;����J3P���X���j��u7k262i6`����Ff������fV��������7��d��������A��4_;��Q��������I�����ov����M�H�Y��~y���l���^X���z���9�u��YY�/'����]M�TC�RZ�5��8������/����`'}�I�v8���[��+`���J,w"�P�.�J���E�Y��$;�����5����Mu�T��/'���P�s�'n���.}Hea_5�B
dY����P=�P3J�dV<�9�a��������#�2��&gd�z��	f7"�h/n�$��U��Z\7)��M*A�u������#�^
��uKaqp,�F����C=�XN5��5'�B����)��g�G��'�
��b3�3u���ki�^�ki�^��lp��J��i?�e��:wf��a�s�;�PW��V�'��I�vo0N'UZ2��%��`'F�u����1 ��\_����1A&zf2�]M���p�u_g���Y���Jn��@�r�v��5`�N��u�lr�d��~R�~��kL�5����9K<��\~.�)V`���
B��QA�<9�����+CV9J�����R.7P�]/%�"�9X�+[�5P�3�^��������0�G�"�3��f+T��L�3�r@-�*R��J@l}�*�c�.���Rq�i�r�}Po`�[�!5�j��j�i����o�[b���O1�`}�^��i�?m-0Fm���m	��}
�ytm�����c;�Q>l��_���s�K���pV�3�XP��S��4d�t������.��.�I]�o����.r�vW�v����V��=`m��Mc��c�����1�9�>r�z��������r^�N��`�g$��>B*�������u�Q�����>`W�h�+S(:��8f �i�TSn�}���:�A�E���:��=�`����5�`�41`�dr��^K����5;`�f���5�
<��]�T��l5v������B�>���;
z�;����=Z�Ot�����K�9-�5n�K��Pb���b��Zb�rG�
Ww+��2�%>��mI$�]\y�\��$1E��*.�8t�;�h�\���A������*��*�%NB(9�ne&9����'r�����p�K��Zr*����li�F��A�)�p���u��:��+����R��l�&��Tz]��i��2U��YG-P����gt|���<��r�zL����Ir��e�\N�����n�2��Z
�[
�
-��v���Sv�{7��<1�\7Z���e�������F�M��v���������2NN�-�=H?]<�����3�
��s}���'���lNA���������U�����r�����,��z�g�Sn���8��/_�:$�7��l�b�����B{�B�c�WyOC�9�
���P���d@V`&?BA�I��cHK������9Z��<�Jn���VrA���q�_Kp���mq1����Pq�[*���w�,�����������Jy�<�=��<��a��*�5��k�����J�q� �8�&�W���h?��K%]Ik����=$��9������V���r�p*���39x�������%���@"�~��
<H{��&a@�N�\LAD�$���Jir�E�2��c��4n����T_\MD�"�c��9�6��z��1�Tj�A�*}�q���:G8x����^�H���������
j:��$���z���U�i�8��'������M�Pila����W�+�a����`�������M�9_-l�h���gb�E*����V��w�b��P1����b��3SK1��S@q���:��8�S�4����`�o9��)1������m�X���i�j� ���������jm`��Z�r�{_���
o�u&�[_s'g����X����P�Vm�9�
�Q��f����\_w0��|
W��f��#��m4��T���l'XwQ���.\�����8�����w�L��;�U�$������6'���Kr��5�s�\5?\<��>����gp���
��T����4BdP��t"~�{�#4��\��_4���B;��,���U��#����}�<��,05Q�)"���#���F��,�����C�p��EL�>Z�r]���9^�K��z	�`�;b�
R��:��%�2 K:R��r���H?x��IS�K�IH-2Q�f`�%���G�r]�xtqN��s�2H�\dxgc����g�fA���O������~��%��Sl
Z6����#Op\��!�q%'{z�6EG����R}��F�����I�`r4oR8
�(|��������`�_��z��u@F	)!%��*�Rp.�����	����n���y����8�B�q�X���?m�
)7�����q��]Iqr�����8YA���L��z�C�=H��*P�_�>r�$1���JU������J&Gk��B�.���9��������K@xn[����;P�L��|��'���;�����M��Fes��T�;�*
���W]������������Y��(��-�8������F�*��u�����o����A��t������|����V����I��S�$x���M�����cO.lH�y��M������H�i���I��!;���[���.�$4h`Z4}$�������(���[j`���4
��x�#��3L�s�`�(��zC��t��� ��4<����yD���A
dd��
v8d��!��~Q��������_�����$M9x��:r��Kv�e\u���SH��L���F�^�T�y�d�5��!M���'�	����L�Z��nH��l��hp�G�%w��K�2I�zf��[���dY���b����|LN�����de6{���r�8�yb��?�K��/	ptZ�����Ip'����C�J��5G���J�;&���8d���� �6+�7-`�nV�+�	�1d��s,�n�*"���n����%+m�vhd�����(�R��Hd���%����;�lr�8U(��:x0`�p�C�'f\�f:r��~�
)�,f�(��C;�;��m��I5�I�y��Z���	��`����p��y���`�s�|�r
�v.���y<@n�K�)��	�����-!���,�F�.��x���TP���F2��P�SGZ��I	��.0��jR�����5�#�B�����Db�(���,��@3
���.e������h�9������uz��r^�
�n�l26�inc�=������q��D���y�<n���X�����?�t� ����n�[YQ�n�>��v
����1��z����M�5f�I�
�m
�q[Rv�m'��Kn��G�t�4N�]���Y�f���r���@�|w�&O�	FUr7���\b��
��p��&!�{4w$=��Pq&��&Q������ ��Q`����b�9�}{. �iqB�u�3�Bu*W~9xp������
j��������������>�����j�s�� 9��#xH�����,��ah����'f�;����H�����'������||�������������=������//O���|������/���_���/�}����Q��{?=~z�����������^�>=}�����_�>}����}������J?��������	������_~�������<��_���E���o�}~�������?�����~�����o���^>~������~�������������{y����{z���/�_�����������PKx���r0��+PK��1Ymeta.xml��_o�0���)��W06�4M�:�R3io�o�7cG�4������m�����{�pqwju��S���D1
�+�iJ�ksf���R��^	������|�Zm:�8)u��x�����D�5��q�,k_��
�7hV���p[w���n��^p{�X�W��cMg)�V����H
Q�ID����2��MF�<�cwA��gp�b���`x:������������T-��V��X8���&�������q���N2��QBB�4]�������R��,��*�i��y�lr��2M({7�U=n������iv�@�����V���<�
p���zP;?�����F���>(�����t��������cF�����2��������)�?ln�Ox��V�:��*Of�~��^��o��^i�.��5�������+�u_�4��������(@��F����nwC��#\��?�����?PKZ�bA��PK��1Ysettings.xml�Z�r�8}��H����8�	T`��B��!�eJ��D����q�~�fg������6��V��������G�f�f�Z(}R
G@�b:���~|Q�R�����������*�H��D�=BE�at�'�B�i�!�E�"DE:�]u��n]I�-��	����TJ�R,�a�)<����X*�����US����1�B����(����������-G�(����4'������.�cY|c	^l����XX����0���V���m��&�q@}�V���jSY�)���A>|c���r�	ZUN�����'����?�]��m{�?���9��� �<MI��G��a�]�SH���P�=��!Mc�5=��&�+E�v���GI���@����p�F��=&���>n2�i��}���	�s6d��x�O�4���LL�}2�J�=G�
�Hs�q��&+�:��y{1��#?e�v ��&����6@�#�]�������-��I�f�����d�L�sy?>&,��	r�4YcDDFa`�	�[LAgT��?� ��(G2n	0)�
p��hb:�8�9�gm,��Db
I��wnx��Z�F�R��g��-(]-e
�������N2��b?���_{��I��-&�|�Fc�d3F�q���D�t8#�F<uowr��8aWN�$���os��X�8K��'���d^�:�^@��C�9U������c�#�otI��DU�2������K���X|Y�jV�A"!�>�sP�#��y6#���ec!I4��dJb�`:i�������}f !a���!`���[n������9#���o���
o��gX���'���gu��6����3����t/=����4tY��@���d�"�(����h����1���n ���]}�D�~�$�>����$S�4;|c�8r$��|"���g�M�WQ����q���^'b�q��/�]��7�E�c���usK�yk�#�G
���s0R����&��0����u�X������9E<�d�8M��PG	?�"�����+��=	�����]$���)L~��!��2J�����xyi5�P��9Z��IL��u�<d�^��p6�#��uL�
�Pk�YT\b{�
�&V����A�����X���I����P��4����iS3�v5M3�26Z��dj��Ghx��;eC��k�)�?8	�� jx�
���4,�A}�P#z��ugN��E�����|��������Z7�P�F*	���_�cW�c�+%������&�4����g�g�m�^��,p���5P��Kz}�������s�6���j�X-��]<������Ah����g3��4����r�j���b��\��n_1����t�b�8���A������t��u���,cp=�?�]��B�h�������G;���T��M�a��D3�O�������{o��A���w�����K�yVrs��GT���,��7�-��P��������N�\���5�W����n�m���N�����n�d�����P�8%M-N��[L��\�PKhI}�
-PK��1Ytbr��3�3Thumbnails/thumbnail.png�PNG


IHDR�e2ROfPLTE###***333<<<CCCKKKTTT[[[ccckkkttt{{{���������������������������������������������������3(� IDATx��]	b�(!���/��/7T�{��{ZeT���iw�c
%�"�����h��u���c����}����W�sQ�<^q|��@F���0�/W��_?��z!�Yk�?�@~|1nW@����V���Z���B�s�3P������O�`�����~��#��s2�����m����������������sP�o�W�Fn?����^�_[�ls��=�x+=�Kp�2��@��&���M8���@xr���U9[+|tf������@R���]Tx��`�������3��0��~7���	���xvd�T*��5�?�"�6�nE�2W8�/=�U�G�y�{���^���K�{�	Mk?�����fK��+����/|X26k���R�`
���_3����.K�����K:�
1��!����8"����k������=�9,f�<�>��
]��y��g%�y�X�g�-��7�:��)���^����hZ"�'���Uv�V
������z`��������xYK6^��
��3��OE��e��A4w	o���+\<�1���+����$"����~��&����3x����Hdx�>�?��6:���p$<��g��/��g	�j����_��F��1�HO�V���?WX��Y
���apw�Ap3����c~���8����"���#���,��%_�N��#�9�&��[�����x��x��X|��X=�>�~@J����;j���/��?�����/�������P�0?V �!N$�/��E����uz�7��U�f
8xI�|�~�g��l!fL��Y�J�f��La�45������T��]�5�33s*��'2��-f������_����r@a2�}
�`�_��w|�p}k^�w�'������C�gd�]90;C>�4����-���/`�|���P2e���7�S�}=�?����M[Qp���S�H�F��������bh
,���^m3@.>f��&Z�������s�Y��"|�[h��j�/�x��B����� ���pF�0�:��(�]�f>Pb
0�
633��C����Nqa��9��y�-������M����,kX[�c
P��}�4��_�#��4�S�]���5�55�K�) ��"����6�,n"�����S%_�&����L�SV��g��y�V�&D�G$/3�=�|^�O�������O�����r�S���zKe<�+���(�6�]��<�������0�B��K�W�������U��1d<���~m������T��)�Mq�����7�o~]-e�O����;:3��5���?��L,q�� /	\i��cs��	���_+���������^3)����W�w�wU?�h����^=�������h����Rp7��t(
���nW����;�h9�r�0�7@�>G��W{���2>@�����[���]�	)�O�w��}��6[����l	�\L-n�7�w�Uym�0�y��[5���:��V����V���l������E��z�D
OE�a�����.���2Yy���=��h0c����������E��o��\&�-�'��P����i`gFX,�Cbo9�;Be�g,u�����q�C��_2\��,�L���Q�7�X��c�`
zg���U��������_�����x4����T��E���9��{\�@7<�\b���YD���\�
ySX�"�gJ���0��12i�B��k
��-s)��D=T�'��V�{b��8;����4�8.;z�a�-���I��t]b
@T#d�����<�M>P'7 �nk����0�NwHKA�7b??(mI9=�=���e�������RPV��Y�d�tfm=U>�oo�n*�=j����6��'��5���f����*C��6�����LO��2��!�`^m�<��O<4�P!���c���da{��yl1S �9�������Sd�7$��p|mWk��h���{�5@k}�h���O�����L��fg�C�T��J��E���=�FP�����&Y���>p�����P���rfb0�>��P����h��_1��x��`=p/`�r*�;)��1@��b/�	������{��C��a��8���-+�c3���Z�z�0c���8��.�����R�K�M�������q�j �'�"P4
�����������z�s��:{�S�����ca����� �>I��W�y�,-�=���hEx`��������n�>��.?��.���1�����i�:�
��tHe���7��w�+� #��U��U$����Y1�<�I�^%x��.�	P!
|�^���A�3��9,DU:"����C1���c����K�BG��=��av>'�*��'f�� E����8=�u@A��C	��
�0o�g���7?�
wz$ `�:g����\@�i���n�_����c
 J�"��;|"�4�H�/C	��Q�#�0+]`)�kU�d�(4�F��	���������>��(>8�3�#T8���vP����K1��)af�@��%�����A/C�As��!
�LOg/��/���#zV4��&)T�_A%0����a
*�8��
����)X�z�X7�kOf�g��v?���� ����EA��2��1��x��
��.���@{��|��f��e���t�X���""�Qn�~\+�I��"�Q�2�q�S�l[��REQ4�{Q �����G��_���o^A���DXTx�J������q��{.�P�(zLCl\HT9
����'�b2��z'4(KGz�z������4�'U��'�0�m�6�i�0�Z�!���6=����(�&�L��<�����5�x������cq��b��j�v@�Q����N�<���_�2e
��@A� \1T;]�K��6��l��;c��`���.��wP�?��V~�P$��-�0�"��Pd��P���M��Y��8H���z���xK�0DA��U��u#Z+)[�0��rp��a��V��x8JY��	.J����O�}I�8s��f�/�����A���x�"PP#�aY�a�(fu��j��6a��%L$M������ �c�m#�n#���%�����B�3��#��"�5��-��B�'Y����%�����m�h�u���,��"��5��X�G��KW��,V��2�S����3��a�2h6���+%���q���T��2B�7���b���J[*�	�C������X���8S�������D}����+H��"w.���|�~�U�"�9�q��3���"}�g��P��m2���O
��^Qj�R�-��pF�
+XYO����C_��"�������on|.1�bsf�+��H�D}�����@�}
0��^v������,)��z�Yl{�i�`r�f��A�zEZ6�O/���nG�R���
`=�~0�%�_c
�������?p#oZ������������"@��)����x�?
�y�4x��Tc��!���Z ��'|�8DQ�>)��3�f'5���v��#��'�4���(�}�r�;���p�a����x\(=�<|����~@=q���Q+�g� �:����S�
��)4U)
��@A�d���<z$�h��y4�y4'�hM��d�]2���<Z�\���zX)2���������:����3���<zz�y�"�h���J��>���5��_a��<����<�)�h�0���<��G��Gwd��y�z��8�P5��@A P(�(N���&�����Aq<�uvz2��@6�����3��v�`y���#|:%LI�*�F�����s�������6���s������c������S��K�L����tR�)�X!@�U��v�@j+�$L<��(c�VI�����<,f�2�g�@���A��<hW�;����
)��/��|J���*�����d������*�G7k���b�C��CGH
�:	���P�F(x�]����C%pd�t��k7�����G�c������5��?�D��1���<B��s�|��V�
)T��aYkf�3\�E���2CR���	E�9����t#K��Q�������������Y!���zEf�(����J�;B��6��h�~>'�W��)��I������XT�r��U���:H�	v�p�"N��F���w2�~�4�8
�'�0�Q*���L�?
E��9�`��������E�HD&���Lp�3v��A����L�OC�j��+#���y����W�T�=������Gb���S��l�;��a�:�g�D[q�Y,������]��O	���/&(�I�m�r����(�) gU����
������� 1y�C^���J/�J����/$�����
#d��pU�2]�E1R4�?�M�i��YaZ+��5�a�SF�n�e�����[t�w�#Y�Dm�-���[������A��A�b}���o� �j;e
��@A� h�$�H��B��E�H��B�
	E��-Y��Pd�e���I����B�����]�\I(R����Ej��H(��d��/��)�'&�P��B��a���z�y4	E�,�P$	E�P���"��E��P�"�����/.��P����Y^H(�BS5��@A� P(���~;z�s��:{�S�����K���@�}����v!����l��B���C0�I�/���Vpo7Kq���c�}���"a�To<�`7Xw�!���~����-���Ff(���U$�@{��k~=�}V�"5��5��B�,p���'BH�	��`������k���"���c����K��7'"{����|N T�����LPV�����sV�+�\��������l�=E�����7��a����NLN�#��i��x5S����6>h����Dy:[�o�3!���P����QBu0��``)�k�2q�{�:Yk��}�xywHY�<]�f�P0��>�J���^Q()af�@�� L�G��1�e�}���X�Q�
N`#�B~_���=+�V_��~U
-�^� k��
p�-):Y7�S0�N���a,X���?��T���<,-����UA�	<2�*���^�\��	4$=������?w�5��`�v.���N����N��V��B�E�B����S�l[��RE������`��=��E�!��_{X8��Dq�
OR���p=�,�������jP?zL�!p- [�B8'���	��oG����`�������2�r.�O����O�`���m��a���CvkQmz��CQ48�M>���y�����\���{I�5��T�T
�������C�|���)
��@A��+T;]�K��6��l���5����Or���.tP��n�@��������������"[W�"7�mj��R\��<H�C��?���QA��U���UJ�JJ�����Q.�1�J���Sk���R&xI��B�O��%]bW��kS����n������wR��D&j��.�Q��.�Z�����}!J�H�+4�Z�Al�d	���6b��{  C�=�V(~F�v��Q��f������0}�B]i���!���hy  �V��a��q(V;A&WHr�H�b�l!�Q>9�#s��2�CUF�������)a�����f�������P&���r���RX���A�%�����q��7F���G�����l����H�;� pB�)�����^T(��8q'`�8�M(��~FN	E�@�&�\����(�� u�B����j���h�^�<:��M(R�) ���]4����5��R ���M�`J?�&x��� p����%��@���m�����!�YLnQ��\<�[�H��Xun����V���@9<0	L���)�`^K����������?p#oZ��u��[�V�0�oZ� �:������I�5&���*<��q�y�GqM�C���B}8hvRS��n��nv��N��	����,��
��2B�"�20J/8C���z�$m��Vp���R���)�F1�<v5U)
��@A�d���<����3������������=�G�j�Q$����G�_6���4��_e�i��s�yt��y��B�h`E�O��70������
��_�4�S����y�])�3���M��
�����G+��>E�0�h�a�h�z�sh~��y�v�<:>�<zaq�g,��<Z�H��5M`�gA�V��������<����w1�.n�_h����"���������F��j �@A� P(?N.�@�
bM��w�
���u��d0��������L��](X^qy�5^A
U���t<�~n�q�zw����F��r��v���|������wJ9�u	wc������ ����zJ��n�D���MBa�������2��`� �����b(Cz���t���v��|�`����
�l�W���|���Pe����~��K���b�G7k���b�C��CGH
�:	���P�F(x�]����C%pd�t��k7���@�^����9��X�g���:��6�G���\!�a��uC
��fA����l��/���T����\~�	�<��YZ%�j����%W4�6��%G~�
�-6.y��E^�(9:$�j�x�6��sG ��C
Evp��� �"!��(�Zu!�(���|��p�"N��F���w2�\����
�(��8r��G�X�*2q�(�G����W�2>rs�"��n[3�=������*���������jt^>(a��"�u���m�w����Ec���)�w`e�ru:w��e^'f�2d�G�@;�+��)au��d�>)��S.R��:8���rV�a�����!��QA8,b�n��,o'�^m�����
#d���:���.H�+��&��4����pe_���� v!�^7���R���-
:�;c���u�����%[�*��D��x#�HE��r0)���@A� �*�@�n�����yt��y���yt�5�hE��d��<�W��j�y���y��.������b�����#0��h�]4�^�<Z��<z"���1���`��<�zg2��W���]��/�����-N �G��=�����y�O�����[�G�.�Go�E��#2����1�
��f=��y�&�h2��dMA�@
��@A� P�;�is�=�{�'�;{�S���3��K���@�}R(�\�B�
�����<z}p7���}������L��
��~���q��<�0���M�T�/��	z�����,�����"y���	���C�j'Q.��5��J�J������g�s(�C#���:�Q(�0=z�(�U�7��=��a�
N`#�!@�h��
X��&�T���)9=���c��.dOQs�Ch�@��M��l(���w'&'W�B2�|�fj~�y4PO�H�Z�(Og��
{vm�#yd6��Y�H��#HYt+gL:u"R�c��|�
\����g����"LA(������2����R(�`�87@��u��2��S���C 2J^�	l������#zV4�H���T%����A4�VU��� �"����i6E��������|@z������q:�LI�z��J�F'P���z��@��{�PC�Vpb�������������c��N����N��V����k�BA�R/�mK�XOq�qI�*j����hP�bN������O%��X�zr? �9����er��Z��q��`�X��k8�X�0��[�`r��{�"����V���4���IDAT�s�w
���@?9��n[ :���������'�D���Qm���O���K�������K2n�R1���pAj������ST� P(�-6	�4�Gk�����^	7pN)q�kI���T�y�v�<�,J�J0z�M,�S��E�����G�H4�Rap}�<�A����b�<�C�A�o�Q��h���y�z���\Rhu�#���1��ob]�Y���������<��G/q�[{�����y�O3��X|�-�����GN �G���4�{���5������������v�yt�*X�qy�	|#��'N �G��n�_��[��%IO�h]_�<Z�h�dR��$��2(��DA��?�����)h
 P(���~���o�4�����6������|����'9����;�E [����~�s�������]�	)>g��j
�;�x�u��8�-pTUt�$�E"�]�j�Ra^�6Lp�~��j���u��8L[	�7sa���������H���T����{��d�LVEy�}|���t�����88���8� ����P�7�N.�}!B��#�72Hr^!�����g�z�C��y�\�t'R��5����������Q��`�`BiUt������
}%R(�`{p���!��p�x4����1�=D��_����e�jp&%L-�<z�����l`[��
��jc�xz54Q?�T���'P���i�
�V'���Hz��"�}OL��$������W����	�0���@��b�P�&��M��y4��;$i�:��SU������>4�
z�PZR�,�kD��/-����j�����~xvsu�����yYW��^8Rd��OAo�5������'YE@C	�he^�����	�o��K0�;k�8A"{x�Y�����y��N ����P���8[�V����}��h�T���X���@{D� �����5`v�0�
������:��)�����V�c������,�>����~��"�3�/��&5OD+���	��;����{K�[���s��@���p'����,V�;���c��������}�mY��m��M�W��fL �=�e��!�����n������Z���j �@A� P(
����w���d����G�2�4�������2��X2�����|��.%�G��cm��2&��v�G��c����nt�VM�����G���_�����O3�����"�B��?�<zk@������d��y4��'X�O�d��yt���"�}������1�N,���O���<Z���=���'��������2���b�_�����W*Q�y��2T��QT� (
�3�	����t�.,e~�2�m��pl7�.����"?�	\JW��A-��9���Y2q`�x�-`���p������"[W�"74�k��R\�Y:��<l��� �N
�����5�iTc�@Ua`l�����y������v�F�Rb�cN��^J�LH���-���6��.�+��5������o�*u�a]w����4~��DmX�e(���(��[��(a"��4jQ0p���%H����-����he��[������G�����s�"{v�W��XEZ�nh���-�n1���_����H|1�\a��H�b�l!�"xd��_�Q`����x�N��~e��j�I�����I�c?�l��Pd�2q�V�3�V��b�<�
J/In������0��1�t�<�G����u�iP�:��f�<�v���=()	��f�u/*��	�����e���&�c?#��"e�m�Q.�DR(&B�T�,�(�(�Q)��a��>>��y�/���W7�H
��e�h��We)���s�&�~����O�`��`y��]'@�l�/K
=8�F1f��?�CZ������x��^����j:i�a`�@�;��	TH�K`�7�u�~0�%d����OM�����8��7-i�:X�-K+A��7-T�n��
�q
�����K�1��Z�Xt�A	�1Q��uT�	����>�k
�(w���@���b��p}�p�Q��;M�',
g����	 ���!#�-/�q��p������Q/[�tv�Y�������!R���)�-2�<v5U)
��@A�d���<z_�<Z�k�]4��7�h��n��y�������G�_6���4���<Z��<:r�<z{�yt�N��w\�<Z�y����oj}W
��y�r�<zC������7��6�F������hE����G�dM�����������G��y�����������G��Gs2�v�<Z��l�l��y4�@C@� P(o/���(�A�I{�.�aPOg����T�p'hI��y�����3��v�`ye���
s%l%��I��Uf5�>�~�"��?.<�Q&������5;�h���)���%�����Hy)4G���
����G���v�@j+�$L<��(c�VI�����<#���������+hW�;����
)��/��4J�<B��_��"M	�c2�o�)�VA�����a���1���A	��#t�\'!Q�pj�o��r^s��L����xMc3)�l;�<����d��!�u-rh
���"Q�u��m6��Wx��[]7�P=o�d��a�p�]�2�I�/���q �����"��P�ndiY� �U�"�\�<��B��5+���("!2Vy��P���P����=����	\��/�$��������$QoA
EBH����W���EA���Y8E�'pe�R@�;��Q�o�h�8
�'�0�Q*���L�?
E��9��Z����	�]�Dd��m��8c����<�@V�q0�����2"���JXzE�He��A��m[��~$F,;U
N`����+���9���?����i/����hX��������m������6N�H���	�����M^�����
�
�a1�w0�ey;�t�"pf~Yo�^�$I<�P|�U��a��pi��P�+�H P�����	�4�#s;+\������BF�n�e�-P��������HV�:Q[k#�}�V��I���(7��p�b�����Y����eM
�D� P(�J!�h�73�/���A4�9���j(����������>�`�}�<Z};�jf,�X�U���8�<��q]2���������Rt
f���s����>v�Q�>�G�P8�������k�G��rQ,My��O����h}�$8����������/�GG`-�<��y�Ui�q{�����<�<Z�D��u�y��}������I@�	�{����G������G�_0��h
f��w�G?YL ^%@�V#;�(���=����%�8)�_������B�`����M8U�~�������oc��B���G���M�W^���#_�bR�g};��������e���x��6����3
s����G�����O��[qQ�����{�g@�`�N�!%�C7��\N������ytg���G�1�^���/x�{���p�c!���1B���G?�����y������!�hM�@
��@A� P(^�S�b9���d����-�+����7{ns�y�_
��.7���U�<��H�O
E��+����hE�,�d|���*o?��*��u
6���}������L��
��~�� ���9������ ����e�;"A��o�
�]o��wCLI�7K��t����A
���D�v����PS	��������g�s(�k��a�D��(	�=z�a	�W,���`)��jC�}��;f�s����� ����d7/b@�U�
�"����������0�=E���K}�	��PE�&&��F&$��N��O��� Q�<���7��9���P����|V#eV���){�����N������9�7nZ�8;w���~!'pf�SAJ����f����'��Ba��a^�+���qd�}8�v�:���D� ��
^8�gE��k
E���BKV���^�}�-):Y7��rm���I����u�Fpx=���t]?���D[q$��q?(��@�u=^}r���-i't�|7��n�����������uo��wR^����Jw�puMR(h]����`��a��%c6.����e����{����x]�}x
�a���aQ�(��~@$�s|�O7Rh
��A��)bq"���c�����
^��s
�	��Pp�b������z���!Mw��������;��n���q�5j�
�[�;�A/�MO�����&�L����~�K�+�����������Wm������{\��A���������C�j��7��4�(
��������(h2�~}��l�^�l��yV��G��Td��<��b�V(RI�*�~�z������.	�-2�����Af�
V*Q�%�F���U O���K�������������i�"���j����d�.e}p�g>[��Ml}'0y+�h}!����T��{�y��@	�<Z�C
�
s�d����'����{
@�Zm�N7�����2�V9���G�C'0Ba[���FU^C�y�=�};��(��y��8��y:��r�y�S�9��n��O�@������q l��8��,M�4�_�	�]�
����G��I��T���%�K��7��|�V���T�T"�hg���.O���(
���^:c����IEND�B`�PK��1YMETA-INF/manifest.xml��Aj�0E�9���XJ�M��(��Tk����F��}����!��x��������l6��oY�m�o�����`�v��
��Lr.��;���a%�*�,Qy��:"�]��$������[xg�0����dBA����!�q������0)������j@J��S����c��]_�@[U�9B�T��v�!qB���ru�c����Xc�3��}I=��-��|����%a�x���<���$��*����6���yY���1��{RO�f ��??����T�eAs�#�wD�W=�q.��N����PKm��Q1,PK��1Y�l9�..mimetypePK��1YTConfigurations2/menubar/PK��1Y�Configurations2/progressbar/PK��1Y�Configurations2/popupmenu/PK��1Y�Configurations2/accelerator/PK��1Y6Configurations2/floater/PK��1YlConfigurations2/statusbar/PK��1Y�Configurations2/toolbar/PK��1Y�Configurations2/toolpanel/PK��1YConfigurations2/images/Bitmaps/PK��1YN�7o
=
Ostyles.xmlPK��1Y��h���manifest.rdfPK��1Yx���r0��+6content.xmlPK��1YZ�bA���>meta.xmlPK��1YhI}�
-�@settings.xmlPK��1Ytbr��3�3,GThumbnails/thumbnail.pngPK��1Ym��Q1,({META-INF/manifest.xmlPKe�|
lock-test.shapplication/x-shellscript; name=lock-test.shDownload
lock-test.pdfapplication/pdf; name=lock-test.pdfDownload
#32Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#31)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing. All changes since the version shared on
September 13 are only cosmetic - renaming a macro to keep it consistent
with the other ones, clarifying a couple comments etc. Nothing major.

I ended up squashing the two parts into a single commit. I thought about
keeping the two steps, but it seemed pointless - the first part inflated
the PGPROC struct, which I didn't like to commit, even if only as an
intermediate WIP state.

So far buildfarm didn't blew up, so let's hope it will stay that way.

I just realized there's no CF entry for this - sorry about that :-( I
started the thread a year ago to discuss an experimental patche, and it
never made it to CFA. But there was a discussion spanning a year, so
hopefully that's enough.

regards

--
Tomas Vondra

#33Ants Aasma
ants.aasma@cybertec.at
In reply to: Tomas Vondra (#32)
Re: scalability bottlenecks with (many) partitions (and more)

On Sat, 21 Sept 2024 at 21:33, Tomas Vondra <tomas@vondra.me> wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing. All changes since the version shared on
September 13 are only cosmetic - renaming a macro to keep it consistent
with the other ones, clarifying a couple comments etc. Nothing major.

Great work on this, I have seen multiple customers hitting fast path
capacity related LockManager contention. They will certainly be glad
to have a fix available when they eventually upgrade. Regretfully I
did not find the time to participate in this discussion during
development. But I did have some thoughts that I wanted to unload to
the list, not as a criticism, but in case it turns out follow up work
is needed.

Driving the array sizing from max_locks_per_transaction seems like a
good idea. The one major difference from the lock table is that while
the lock table is partitioned dynamically between backends, the fast
path array has a static size per backend. One case where that
distinction matters is when only a fraction of backends try to lock
large numbers of relations. This fraction will still fall back to main
lock tables, but at least the contention should be limited by virtue
of not having too many of those backends. The other case is when
max_connections is much higher than the number of backends actually
used. Then backends may be consuming well over
max_locks_per_transaction without running into lock table capacity
issues.

In both cases users will have the simple workaround of just increasing
the max_locks_per_transaction setting. Still, I'm sure they would be
happier if things just worked without any tuning. So I tried to figure
out some scheme to get dynamic allocation of fast path locks.

The best data structure I came up with was to have a shared fast path
lock array. Still partitioned as a 16-way associative cache, but
indexed by hash(BackendId, RelationId). fpLockBits can be stuffed into
the high byte of BackendId thanks to MAX_BACKENDS. Locking could be
handled by one lock per way, or at least on cursory glance it
shouldn't be too difficult to convert the whole fast path acquisition
to be lock free.

Either way, it feels like structuring the array this way could result
in a large amount of false sharing of cache lines. Current static
allocation means that each process needs to touch only a small set of
cache lines only referenced by itself - quite probable to keep those
lines in CPU local L2 in exclusive mode. In a shared array a larger
number of cache lines are needed and they will be concurrently written
to by other backends - lots of invalidation messages and cache line
bouncing. I don't know how large this effect will be without doing a
prototype and running it on a large machine with high core-to-core
latencies.

It would be possible to create a hybrid approach of a small local FP
array servicing the majority of acquisitions with a larger shared
victim cache for exceptional cases. But it doesn't feel like it is
worth the complexity. At least not without seeing some example
workloads where it would help. And even then, maybe using hierarchical
locking to do less work is the better approach.

Being optimistic, perhaps the current patch was enough to resolve the issue.

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com

#34Tomas Vondra
tomas@vondra.me
In reply to: Ants Aasma (#33)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/22/24 10:50, Ants Aasma wrote:

On Sat, 21 Sept 2024 at 21:33, Tomas Vondra <tomas@vondra.me> wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing. All changes since the version shared on
September 13 are only cosmetic - renaming a macro to keep it consistent
with the other ones, clarifying a couple comments etc. Nothing major.

Great work on this, I have seen multiple customers hitting fast path
capacity related LockManager contention. They will certainly be glad
to have a fix available when they eventually upgrade. Regretfully I
did not find the time to participate in this discussion during
development. But I did have some thoughts that I wanted to unload to
the list, not as a criticism, but in case it turns out follow up work
is needed.

Driving the array sizing from max_locks_per_transaction seems like a
good idea. The one major difference from the lock table is that while
the lock table is partitioned dynamically between backends, the fast
path array has a static size per backend. One case where that
distinction matters is when only a fraction of backends try to lock
large numbers of relations. This fraction will still fall back to main
lock tables, but at least the contention should be limited by virtue
of not having too many of those backends. The other case is when
max_connections is much higher than the number of backends actually
used. Then backends may be consuming well over
max_locks_per_transaction without running into lock table capacity
issues.

I agree. I don't think the case with a couple lock-hungry backends
matters too much, because as you say there can't be too many of them, so
the contention should not be too bad. At least that was my reasoning.

Regarding the case with very high max_connection values - I doubt we
want to optimize for that very much. Extremely high max_connection
values are a clear anti-pattern (IMO), and if you choose to do that
anyway, you simply have to accept that connections have costs. The
memory for fast-path locking is one of those costs.

I'm not against improving that, ofc, but I think we should only do that
if it doesn't hurt the "reasonable" setups.

In both cases users will have the simple workaround of just increasing
the max_locks_per_transaction setting. Still, I'm sure they would be
happier if things just worked without any tuning. So I tried to figure
out some scheme to get dynamic allocation of fast path locks.

I agree with the premise that less tuning is better. Which is why we
tied this to max_locks_per_transaction.

The best data structure I came up with was to have a shared fast path
lock array. Still partitioned as a 16-way associative cache, but
indexed by hash(BackendId, RelationId). fpLockBits can be stuffed into
the high byte of BackendId thanks to MAX_BACKENDS. Locking could be
handled by one lock per way, or at least on cursory glance it
shouldn't be too difficult to convert the whole fast path acquisition
to be lock free.

Either way, it feels like structuring the array this way could result
in a large amount of false sharing of cache lines. Current static
allocation means that each process needs to touch only a small set of
cache lines only referenced by itself - quite probable to keep those
lines in CPU local L2 in exclusive mode. In a shared array a larger
number of cache lines are needed and they will be concurrently written
to by other backends - lots of invalidation messages and cache line
bouncing. I don't know how large this effect will be without doing a
prototype and running it on a large machine with high core-to-core
latencies.

I don't have a very good intuition regarding cachelines. Ideally, the
backends would access disjunct parts of the array, so there should not
be a lot of false sharing. But maybe I'm wrong, hard to say without an
experimental patch.

It would be possible to create a hybrid approach of a small local FP
array servicing the majority of acquisitions with a larger shared
victim cache for exceptional cases. But it doesn't feel like it is
worth the complexity. At least not without seeing some example
workloads where it would help. And even then, maybe using hierarchical
locking to do less work is the better approach.

Not sure. My intuition would be to keep this as simple a possible.
Having a shared lock table and also a separate fast-path cache is
already sufficiently complex, adding cache for a cache seems a bit too
much to me.

Being optimistic, perhaps the current patch was enough to resolve the issue.

It's an improvement. But if you want to give the shared fast-path cache
a try, go ahead - if you write a patch, I promise to review it.

regards

--
Tomas Vondra

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#32)
Re: scalability bottlenecks with (many) partitions (and more)

Tomas Vondra <tomas@vondra.me> writes:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

Coverity is not terribly happy with this. "Assert(fpPtr = fpEndPtr);"
is very clearly not doing what you presumably intended. The others
look like overaggressive assertion checking. If you don't want those
macros to assume that the argument is unsigned, you could force the
issue, say with

 #define FAST_PATH_GROUP(index)	\
-	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
 	 ((index) / FP_LOCK_SLOTS_PER_GROUP))

________________________________________________________________________________________________________
*** CID 1619664: Incorrect expression (ASSERT_SIDE_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/proc.c: 325 in InitProcGlobal()
319 pg_atomic_init_u32(&(proc->procArrayGroupNext), INVALID_PROC_NUMBER);
320 pg_atomic_init_u32(&(proc->clogGroupNext), INVALID_PROC_NUMBER);
321 pg_atomic_init_u64(&(proc->waitStart), 0);
322 }
323
324 /* Should have consumed exactly the expected amount of fast-path memory. */

CID 1619664: Incorrect expression (ASSERT_SIDE_EFFECT)
Assignment "fpPtr = fpEndPtr" has a side effect. This code will work differently in a non-debug build.

325 Assert(fpPtr = fpEndPtr);
326
327 /*
328 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
329 * processes and prepared transactions.
330 */

________________________________________________________________________________________________________
*** CID 1619662: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 3731 in GetLockStatusData()
3725
3726 LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
3727
3728 for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
3729 {
3730 LockInstanceData *instance;

CID 1619662: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "f >= 0U".

3731 uint32 lockbits = FAST_PATH_GET_BITS(proc, f);
3732
3733 /* Skip unallocated slots. */
3734 if (!lockbits)
3735 continue;
3736

________________________________________________________________________________________________________
*** CID 1619661: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 2696 in FastPathGrantRelationLock()
2690 uint32 group = FAST_PATH_REL_GROUP(relid);
2691
2692 /* Scan for existing entry for this relid, remembering empty slot. */
2693 for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
2694 {
2695 /* index into the whole per-backend array */

CID 1619661: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "group >= 0U".

2696 uint32 f = FAST_PATH_SLOT(group, i);
2697
2698 if (FAST_PATH_GET_BITS(MyProc, f) == 0)
2699 unused_slot = f;
2700 else if (MyProc->fpRelId[f] == relid)
2701 {

________________________________________________________________________________________________________
*** CID 1619660: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 2813 in FastPathTransferRelationLocks()
2807
2808 for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
2809 {
2810 uint32 lockmode;
2811
2812 /* index into the whole per-backend array */

CID 1619660: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "group >= 0U".

2813 uint32 f = FAST_PATH_SLOT(group, j);
2814
2815 /* Look for an allocated slot matching the given relid. */
2816 if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
2817 continue;
2818

________________________________________________________________________________________________________
*** CID 1619659: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 3067 in GetLockConflicts()
3061
3062 for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
3063 {
3064 uint32 lockmask;
3065
3066 /* index into the whole per-backend array */

CID 1619659: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "group >= 0U".

3067 uint32 f = FAST_PATH_SLOT(group, j);
3068
3069 /* Look for an allocated slot matching the given relid. */
3070 if (relid != proc->fpRelId[f])
3071 continue;
3072 lockmask = FAST_PATH_GET_BITS(proc, f);

________________________________________________________________________________________________________
*** CID 1619658: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 2739 in FastPathUnGrantRelationLock()
2733 uint32 group = FAST_PATH_REL_GROUP(relid);
2734
2735 FastPathLocalUseCounts[group] = 0;
2736 for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
2737 {
2738 /* index into the whole per-backend array */

CID 1619658: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "group >= 0U".

2739 uint32 f = FAST_PATH_SLOT(group, i);
2740
2741 if (MyProc->fpRelId[f] == relid
2742 && FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
2743 {
2744 Assert(!result);

________________________________________________________________________________________________________
*** CID 1619657: Integer handling issues (NO_EFFECT)
/srv/coverity/git/pgsql-git/postgresql/src/backend/storage/lmgr/lock.c: 2878 in FastPathGetRelationLockEntry()
2872
2873 for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
2874 {
2875 uint32 lockmode;
2876
2877 /* index into the whole per-backend array */

CID 1619657: Integer handling issues (NO_EFFECT)
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "group >= 0U".

2878 uint32 f = FAST_PATH_SLOT(group, i);
2879
2880 /* Look for an allocated slot matching the given relid. */
2881 if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
2882 continue;
2883

regards, tom lane

#36Tomas Vondra
tomas@vondra.me
In reply to: Tom Lane (#35)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/22/24 17:45, Tom Lane wrote:

Tomas Vondra <tomas@vondra.me> writes:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

Coverity is not terribly happy with this. "Assert(fpPtr = fpEndPtr);"
is very clearly not doing what you presumably intended. The others
look like overaggressive assertion checking. If you don't want those
macros to assume that the argument is unsigned, you could force the
issue, say with

#define FAST_PATH_GROUP(index)	\
-	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
((index) / FP_LOCK_SLOTS_PER_GROUP))

Ah, you're right. I'll fix those asserts tomorrow.

The first is clearly wrong, of course.

For the (x >= 0) asserts, doing it this way relies on negative values
wrapping to large positive ones, correct? AFAIK it's guaranteed to be a
very large value, so it can't accidentally be less than the slot count.

regards

--
Tomas Vondra

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#36)
Re: scalability bottlenecks with (many) partitions (and more)

Tomas Vondra <tomas@vondra.me> writes:

On 9/22/24 17:45, Tom Lane wrote:

#define FAST_PATH_GROUP(index)	\
-	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
((index) / FP_LOCK_SLOTS_PER_GROUP))

For the (x >= 0) asserts, doing it this way relies on negative values
wrapping to large positive ones, correct? AFAIK it's guaranteed to be a
very large value, so it can't accidentally be less than the slot count.

Right, any negative value would wrap to something more than
INT32_MAX.

regards, tom lane

#38Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#30)
Re: scalability bottlenecks with (many) partitions (and more)

On Mon, Sep 16, 2024 at 4:19 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/16/24 15:11, Jakub Wartak wrote:

On Fri, Sep 13, 2024 at 1:45 AM Tomas Vondra <tomas@vondra.me> wrote:

[..]

Anyway, at this point I'm quite happy with this improvement. I didn't
have any clear plan when to commit this, but I'm considering doing so
sometime next week, unless someone objects or asks for some additional
benchmarks etc.

Thank you very much for working on this :)

The only fact that comes to my mind is that we could blow up L2
caches. Fun fact, so if we are growing PGPROC by 6.3x, that's going to
be like one or two 2MB huge pages more @ common max_connections=1000
x86_64 (830kB -> ~5.1MB), and indeed:

[..]

then maybe(?) one could observe further degradation of dTLB misses in
the perf-stat counter under some microbenchmark, but measuring that
requires isolated and physical hardware. Maybe that would be actually
noise due to overhead of context-switches itself. Just trying to think
out loud, what big PGPROC could cause here. But this is already an
unhealthy and non-steady state of the system, so IMHO we are good,
unless someone comes up with a better (more evil) idea.

I've been thinking about such cases too, but I don't think it can really
happen in practice, because:

- How likely is it that the sessions will need a lot of OIDs, but not
the same ones? Also, why would it matter that the OIDs are not the same,
I don't think it matters unless one of the sessions needs an exclusive
lock, at which point the optimization doesn't really matter.

- If having more fast-path slots means it doesn't fit into L2 cache,
would we fit into L2 without it? I don't think so - if there really are
that many locks, we'd have to add those into the shared lock table, and
there's a lot of extra stuff to keep in memory (relcaches, ...).

This is pretty much one of the cases I focused on in my benchmarking,
and I'm yet to see any regression.

Sorry for answering this so late. Just for context here: I was
imagining a scenario with high max_connections about e.g. schema-based
multi-tenancy and no partitioning (so all would be fine without this
$thread/commit ; so under 16 (fast)locks would be taken). The OIDs
need to be different to avoid contention: so that futex() does not end
up really in syscall (just user-space part). My theory was that a much
smaller PGPROC should be doing much less (data) cache-line fetches
than with-the-patch. That hash() % prime , hits various parts of a
larger array - so without patch should be quicker as it wouldn't be
randomly hitting some larger array[], but it might be noise as you
state. It was a theoretical attempt at crafting the worst possible
conditions for the patch, so feel free to disregard as it already
assumes some anti-pattern (big & all active max_connections).

Well the only thing I could think of was to add to the
doc/src/sgml/config.sgml / "max_locks_per_transaction" GUC, that "it
is also used as advisory for the number of groups used in
lockmanager's fast-path implementation" (that is, without going into
further discussion, as even pg_locks discussion
doc/src/sgml/system-views.sgml simply uses that term).

Thanks, I'll consider mentioning this in max_locks_per_transaction.
Also, I think there's a place calculating the amount of per-connection
memory, so maybe that needs to be updated too.

I couldn't find it in current versions, but maybe that's helpful/reaffirming:
- up to 9.2. there were exact formulas used, see "(1800 + 270 *
max_locks_per_transaction) * max_connections" [1]https://www.postgresql.org/docs/9.2/kernel-resources.html , that's a long time
gone now.
- if anything then Andres might want to improve a little his blog
entry: [1]https://www.postgresql.org/docs/9.2/kernel-resources.html (my take is that is seems to be the most accurate and
authoritative technical information that we have online)

-J.

[1]: https://www.postgresql.org/docs/9.2/kernel-resources.html
[2]: https://blog.anarazel.de/2020/10/07/measuring-the-memory-overhead-of-a-postgres-connection/

#39Tomas Vondra
tomas@vondra.me
In reply to: Tom Lane (#37)
Re: scalability bottlenecks with (many) partitions (and more)

On 9/23/24 01:06, Tom Lane wrote:

Tomas Vondra <tomas@vondra.me> writes:

On 9/22/24 17:45, Tom Lane wrote:

#define FAST_PATH_GROUP(index)	\
-	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
((index) / FP_LOCK_SLOTS_PER_GROUP))

For the (x >= 0) asserts, doing it this way relies on negative values
wrapping to large positive ones, correct? AFAIK it's guaranteed to be a
very large value, so it can't accidentally be less than the slot count.

Right, any negative value would wrap to something more than
INT32_MAX.

Thanks. Pushed a fix for these issues, hopefully coverity will be happy.

BTW is the coverity report accessible somewhere? I know someone
mentioned that in the past, but I don't recall the details. Maybe we
should have a list of all these resources, useful for committers,
somewhere on the wiki?

regards

--
Tomas Vondra

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#39)
Re: scalability bottlenecks with (many) partitions (and more)

Tomas Vondra <tomas@vondra.me> writes:

Thanks. Pushed a fix for these issues, hopefully coverity will be happy.

Thanks.

BTW is the coverity report accessible somewhere? I know someone
mentioned that in the past, but I don't recall the details. Maybe we
should have a list of all these resources, useful for committers,
somewhere on the wiki?

Currently those reports only go to the security team. Perhaps
we should rethink that?

regards, tom lane

#41Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tomas Vondra (#21)
4 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On Wed, 4 Sept 2024 at 17:32, Tomas Vondra <tomas@vondra.me> wrote:

On 9/4/24 16:25, Matthias van de Meent wrote:

On Tue, 3 Sept 2024 at 18:20, Tomas Vondra <tomas@vondra.me> wrote:

FWIW the actual cost is somewhat higher, because we seem to need ~400B
for every lock (not just the 150B for the LOCK struct).

We do indeed allocate two PROCLOCKs for every LOCK, and allocate those
inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in
dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash
buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being
used[^1]. Does that align with your usage numbers, or are they
significantly larger?

I see more like ~470B per lock. If I patch CalculateShmemSize to log the
shmem allocated, I get this:

max_connections=100 max_locks_per_transaction=1000 => 194264001
max_connections=100 max_locks_per_transaction=2000 => 241756967

and (((241756967-194264001)/100/1000)) = 474

Could be alignment of structs or something, not sure.

NLOCKENTS is calculated based off of MaxBackends, which is the sum of
MaxConnections + autovacuum_max_workers + 1 +
max_worker_processes + max_wal_senders; which by default add
22 more slots.

After adjusting for that, we get 388 bytes /lock, which is
approximately in line with the calculation.

At least based on a quick experiment. (Seems a bit high, right?).

Yeah, that does seem high, thanks for nerd-sniping me.

[...]

Alltogether that'd save 40 bytes/lock entry on size, and ~35
bytes/lock on "safety margin", for a saving of (up to) 19% of our
current allocation. I'm not sure if these tricks would benefit with
performance or even be a demerit, apart from smaller structs usually
being better at fitting better in CPU caches.

Not sure either, but it seems worth exploring. If you do an experimental
patch for the LOCK size reduction, I can get some numbers.

It took me some time to get back to this, and a few hours to
experiment, but here's that experimental patch. Attached 4 patches,
which together reduce the size of the shared lock tables by about 34%
on my 64-bit system.

1/4 implements the MAX_LOCKMODES changes to LOCK I mentioned before,
saving 16 bytes.
2/4 packs the LOCK struct more tightly, for another 8 bytes saved.
3/4 reduces the PROCLOCK struct size by 8 bytes with a PGPROC* ->
ProcNumber substitution, allowing packing with fields previously
reduced in size in patch 2/4.
4/4 reduces the size fo the PROCLOCK table by limiting the average
number of per-backend locks to max_locks_per_transaction (rather than
the current 2*max_locks_per_transaction when getting locks that other
backends also requested), and makes the shared lock tables fully
pre-allocated.

1-3 together save 11% on the lock tables in 64-bit builds, and 4/4
saves another ~25%, for a total of ~34% on per-lockentry shared memory
usage; from ~360 bytes to ~240 bytes.

Note that this doesn't include the ~4.5 bytes added per PGPROC entry
per mlpxid for fastpath locking; I've ignored those for now.

Not implemented, but technically possible: the PROCLOCK table _could_
be further reduced in size by acknowledging that each of that struct
is always stored after dynahash HASHELEMENTs, which have 4 bytes of
padding on 64-bit systems. By changing PROCLOCKTAG's myProc to
ProcNumber, one could pack that field into the padding of the hash
element header, reducing the effective size of the hash table's
entries by 8 bytes, and thus the total size of the tables by another
few %. I don't think that trade-off is worth it though, given the
complexity and trickery required to get that to work well.

I'm not sure about the safety margins. 10% sure seems like quite a bit
of memory (it might not have in the past, but as the instances are
growing, that probably changed).

I have not yet touched this safety margin.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v0-0002-Reduce-size-of-LOCK-by-8-more-bytes.patchapplication/octet-stream; name=v0-0002-Reduce-size-of-LOCK-by-8-more-bytes.patchDownload
From 98d69e37db722dde4cd4dceda1123f6fa0fb8d8b Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Wed, 20 Nov 2024 03:46:16 +0100
Subject: [PATCH v0 2/4] Reduce size of LOCK by 8 more bytes

LOCKMASK will only use bits [1..8], and thus always fits in uint16.  By
changing the type from int to uint16, and moving the grant/wait masks in
LOCK to the padding space of dclist_head, we save 8 bytes on the struct
when the binary is compiled for a 64-bit architecture.
---
 src/include/storage/lock.h          | 18 +++++++++--
 src/include/storage/lockdefs.h      |  2 +-
 src/backend/storage/lmgr/deadlock.c | 12 ++++----
 src/backend/storage/lmgr/lock.c     | 48 ++++++++++++++---------------
 src/backend/storage/lmgr/proc.c     |  8 ++---
 5 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index b2523bf79d..345ded934f 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -289,6 +289,13 @@ typedef struct LOCKTAG
 	 (locktag).locktag_type = LOCKTAG_APPLY_TRANSACTION, \
 	 (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
 
+/*
+ * On 64-bit architectures there are 4 bytes of padding in dclist_head. We
+ * reuse those 4 padding bytes to store some values.
+ */
+#define SIZEOF_PACKED_DCLIST_HEAD	\
+	(offsetof(dclist_head, count) + sizeof(uint32))
+
 /*
  * Per-locked-object lock information:
  *
@@ -313,10 +320,15 @@ typedef struct LOCK
 	LOCKTAG		tag;			/* unique identifier of lockable object */
 
 	/* data */
-	LOCKMASK	grantMask;		/* bitmask for lock types already granted */
-	LOCKMASK	waitMask;		/* bitmask for lock types awaited */
 	dlist_head	procLocks;		/* list of PROCLOCK objects assoc. with lock */
-	dclist_head waitProcs;		/* list of PGPROC objects waiting on lock */
+	union {
+		dclist_head waitProcs;		/* list of PGPROC objects waiting on lock */
+		struct {
+			char		pad[SIZEOF_PACKED_DCLIST_HEAD];
+			LOCKMASK	grantMask;	/* bitmask for lock types already granted */
+			LOCKMASK	waitMask;	/* bitmask for lock types awaited */
+		} masks;
+	} packed;
 	int			requested[MAX_LOCKMODES];	/* counts of requested locks */
 	int			nRequested;		/* total of requested[] array */
 	int			granted[MAX_LOCKMODES]; /* counts of granted locks */
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 810b297edf..c75b98960b 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -22,7 +22,7 @@
  * mask indicating a set of held or requested lock types (the bit 1<<mode
  * corresponds to a particular lock mode).
  */
-typedef int LOCKMASK;
+typedef uint16 LOCKMASK;
 typedef int LOCKMODE;
 
 /*
diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index fcb874d234..72ba141a53 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -248,7 +248,7 @@ DeadLockCheck(PGPROC *proc)
 		LOCK	   *lock = waitOrders[i].lock;
 		PGPROC	  **procs = waitOrders[i].procs;
 		int			nProcs = waitOrders[i].nProcs;
-		dclist_head *waitQueue = &lock->waitProcs;
+		dclist_head *waitQueue = &lock->packed.waitProcs;
 
 		Assert(nProcs == dclist_count(waitQueue));
 
@@ -697,7 +697,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc,
 		dclist_head *waitQueue;
 
 		/* Use the true lock wait queue order */
-		waitQueue = &lock->waitProcs;
+		waitQueue = &lock->packed.waitProcs;
 
 		/*
 		 * Find the last member of the lock group that is present in the wait
@@ -813,8 +813,8 @@ ExpandConstraints(EDGE *constraints,
 		/* No, so allocate a new list */
 		waitOrders[nWaitOrders].lock = lock;
 		waitOrders[nWaitOrders].procs = waitOrderProcs + nWaitOrderProcs;
-		waitOrders[nWaitOrders].nProcs = dclist_count(&lock->waitProcs);
-		nWaitOrderProcs += dclist_count(&lock->waitProcs);
+		waitOrders[nWaitOrders].nProcs = dclist_count(&lock->packed.waitProcs);
+		nWaitOrderProcs += dclist_count(&lock->packed.waitProcs);
 		Assert(nWaitOrderProcs <= MaxBackends);
 
 		/*
@@ -861,7 +861,7 @@ TopoSort(LOCK *lock,
 		 int nConstraints,
 		 PGPROC **ordering)		/* output argument */
 {
-	dclist_head *waitQueue = &lock->waitProcs;
+	dclist_head *waitQueue = &lock->packed.waitProcs;
 	int			queue_size = dclist_count(waitQueue);
 	PGPROC	   *proc;
 	int			i,
@@ -1049,7 +1049,7 @@ TopoSort(LOCK *lock,
 static void
 PrintLockQueue(LOCK *lock, const char *info)
 {
-	dclist_head *waitQueue = &lock->waitProcs;
+	dclist_head *waitQueue = &lock->packed.waitProcs;
 	dlist_iter	proc_iter;
 
 	printf("%s lock %p queue ", info, lock);
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 9bf6fbf976..65349b1196 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -373,14 +373,14 @@ LOCK_PRINT(const char *where, const LOCK *lock, LOCKMODE type)
 			 lock->tag.locktag_field1, lock->tag.locktag_field2,
 			 lock->tag.locktag_field3, lock->tag.locktag_field4,
 			 lock->tag.locktag_type, lock->tag.locktag_lockmethodid,
-			 lock->grantMask,
+			 lock->packed.masks.grantMask,
 			 lock->requested[0], lock->requested[1], lock->requested[2],
 			 lock->requested[3], lock->requested[4], lock->requested[5],
 			 lock->requested[6], lock->requested[7], lock->nRequested,
 			 lock->granted[0], lock->granted[1], lock->granted[2],
 			 lock->granted[3], lock->granted[4], lock->granted[5],
 			 lock->granted[6], lock->granted[7], lock->nGranted,
-			 dclist_count(&lock->waitProcs),
+			 dclist_count(&lock->packed.waitProcs),
 			 LockMethods[LOCK_LOCKMETHOD(*lock)]->lockModeNames[type]);
 }
 
@@ -768,7 +768,7 @@ LockHasWaiters(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	/*
 	 * Do the checking.
 	 */
-	if ((lockMethodTable->conflictTab[lockmode] & lock->waitMask) != 0)
+	if ((lockMethodTable->conflictTab[lockmode] & lock->packed.masks.waitMask) != 0)
 		hasWaiters = true;
 
 	LWLockRelease(partitionLock);
@@ -1085,7 +1085,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * wait queue.  Otherwise, check for conflict with already-held locks.
 	 * (That's last because most complex check.)
 	 */
-	if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
+	if (lockMethodTable->conflictTab[lockmode] & lock->packed.masks.waitMask)
 		found_conflict = true;
 	else
 		found_conflict = LockCheckConflicts(lockMethodTable, lockmode,
@@ -1259,10 +1259,10 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	 */
 	if (!found)
 	{
-		lock->grantMask = 0;
-		lock->waitMask = 0;
+		lock->packed.masks.grantMask = 0;
+		lock->packed.masks.waitMask = 0;
 		dlist_init(&lock->procLocks);
-		dclist_init(&lock->waitProcs);
+		dclist_init(&lock->packed.waitProcs);
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -1344,7 +1344,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	else
 	{
 		PROCLOCK_PRINT("LockAcquire: found", proclock);
-		Assert((proclock->holdMask & ~lock->grantMask) == 0);
+		Assert((proclock->holdMask & ~lock->packed.masks.grantMask) == 0);
 
 #ifdef CHECK_DEADLOCK_RISK
 
@@ -1497,12 +1497,12 @@ LockCheckConflicts(LockMethod lockMethodTable,
 	 * first check for global conflicts: If no locks conflict with my request,
 	 * then I get the lock.
 	 *
-	 * Checking for conflict: lock->grantMask represents the types of
+	 * Checking for conflict: lock->packed.masks.grantMask represents the types of
 	 * currently held locks.  conflictTable[lockmode] has a bit set for each
 	 * type of lock that conflicts with request.   Bitwise compare tells if
 	 * there is a conflict.
 	 */
-	if (!(conflictMask & lock->grantMask))
+	if (!(conflictMask & lock->packed.masks.grantMask))
 	{
 		PROCLOCK_PRINT("LockCheckConflicts: no conflict", proclock);
 		return false;
@@ -1615,9 +1615,9 @@ GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode)
 
 	lock->nGranted++;
 	lock->granted[lockmode - 1]++;
-	lock->grantMask |= LOCKBIT_ON(lockmode);
+	lock->packed.masks.grantMask |= LOCKBIT_ON(lockmode);
 	if (lock->granted[lockmode - 1] == lock->requested[lockmode - 1])
-		lock->waitMask &= LOCKBIT_OFF(lockmode);
+		lock->packed.masks.waitMask &= LOCKBIT_OFF(lockmode);
 	proclock->holdMask |= LOCKBIT_ON(lockmode);
 	LOCK_PRINT("GrantLock", lock, lockmode);
 	Assert((lock->nGranted > 0) && (lock->granted[lockmode - 1] > 0));
@@ -1655,7 +1655,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 	if (lock->granted[lockmode - 1] == 0)
 	{
 		/* change the conflict mask.  No more of this lock type. */
-		lock->grantMask &= LOCKBIT_OFF(lockmode);
+		lock->packed.masks.grantMask &= LOCKBIT_OFF(lockmode);
 	}
 
 	LOCK_PRINT("UnGrantLock: updated", lock, lockmode);
@@ -1669,7 +1669,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 	 * some waiter, who could now be awakened because he doesn't conflict with
 	 * his own locks.
 	 */
-	if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
+	if (lockMethodTable->conflictTab[lockmode] & lock->packed.masks.waitMask)
 		wakeupNeeded = true;
 
 	/*
@@ -1971,11 +1971,11 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
 	Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING);
 	Assert(proc->links.next != NULL);
 	Assert(waitLock);
-	Assert(!dclist_is_empty(&waitLock->waitProcs));
+	Assert(!dclist_is_empty(&waitLock->packed.waitProcs));
 	Assert(0 < lockmethodid && lockmethodid < lengthof(LockMethods));
 
 	/* Remove proc from lock's wait queue */
-	dclist_delete_from_thoroughly(&waitLock->waitProcs, &proc->links);
+	dclist_delete_from_thoroughly(&waitLock->packed.waitProcs, &proc->links);
 
 	/* Undo increments of request counts by waiting process */
 	Assert(waitLock->nRequested > 0);
@@ -1985,7 +1985,7 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
 	waitLock->requested[lockmode - 1]--;
 	/* don't forget to clear waitMask bit if appropriate */
 	if (waitLock->granted[lockmode - 1] == waitLock->requested[lockmode - 1])
-		waitLock->waitMask &= LOCKBIT_OFF(lockmode);
+		waitLock->packed.masks.waitMask &= LOCKBIT_OFF(lockmode);
 
 	/* Clean up the proc's own state, and pass it the ok/fail signal */
 	proc->waitLock = NULL;
@@ -2459,7 +2459,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 			Assert(lock->nRequested >= 0);
 			Assert(lock->nGranted >= 0);
 			Assert(lock->nGranted <= lock->nRequested);
-			Assert((proclock->holdMask & ~lock->grantMask) == 0);
+			Assert((proclock->holdMask & ~lock->packed.masks.grantMask) == 0);
 
 			/*
 			 * Release the previously-marked lock modes
@@ -3605,7 +3605,7 @@ PostPrepare_Locks(TransactionId xid)
 			Assert(lock->nRequested >= 0);
 			Assert(lock->nGranted >= 0);
 			Assert(lock->nGranted <= lock->nRequested);
-			Assert((proclock->holdMask & ~lock->grantMask) == 0);
+			Assert((proclock->holdMask & ~lock->packed.masks.grantMask) == 0);
 
 			/* Ignore it if nothing to release (must be a session lock) */
 			if (proclock->releaseMask == 0)
@@ -4046,7 +4046,7 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data)
 	}
 
 	/* Enlarge waiter_pids[] if it's too small to hold all wait queue PIDs */
-	waitQueue = &(theLock->waitProcs);
+	waitQueue = &(theLock->packed.waitProcs);
 	queue_size = dclist_count(waitQueue);
 
 	if (queue_size > data->maxpids - data->npids)
@@ -4328,10 +4328,10 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	 */
 	if (!found)
 	{
-		lock->grantMask = 0;
-		lock->waitMask = 0;
+		lock->packed.masks.grantMask = 0;
+		lock->packed.masks.waitMask = 0;
 		dlist_init(&lock->procLocks);
-		dclist_init(&lock->waitProcs);
+		dclist_init(&lock->packed.waitProcs);
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -4406,7 +4406,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	else
 	{
 		PROCLOCK_PRINT("lock_twophase_recover: found", proclock);
-		Assert((proclock->holdMask & ~lock->grantMask) == 0);
+		Assert((proclock->holdMask & ~lock->packed.masks.grantMask) == 0);
 	}
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 720ef99ee8..a359c0be21 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1088,7 +1088,7 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	PROCLOCK   *proclock = locallock->proclock;
 	uint32		hashcode = locallock->hashcode;
 	LWLock	   *partitionLock PG_USED_FOR_ASSERTS_ONLY = LockHashPartitionLock(hashcode);
-	dclist_head *waitQueue = &lock->waitProcs;
+	dclist_head *waitQueue = &lock->packed.waitProcs;
 	PGPROC	   *insert_before = NULL;
 	LOCKMASK	myProcHeldLocks;
 	LOCKMASK	myHeldLocks;
@@ -1223,7 +1223,7 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	else
 		dclist_push_tail(waitQueue, &MyProc->links);
 
-	lock->waitMask |= LOCKBIT_ON(lockmode);
+	lock->packed.masks.waitMask |= LOCKBIT_ON(lockmode);
 
 	/* Set up wait information in PGPROC object, too */
 	MyProc->heldLocks = myProcHeldLocks;
@@ -1708,7 +1708,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING);
 
 	/* Remove process from wait queue */
-	dclist_delete_from_thoroughly(&proc->waitLock->waitProcs, &proc->links);
+	dclist_delete_from_thoroughly(&proc->waitLock->packed.waitProcs, &proc->links);
 
 	/* Clean up process' state and pass it the ok/fail signal */
 	proc->waitLock = NULL;
@@ -1730,7 +1730,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 void
 ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
 {
-	dclist_head *waitQueue = &lock->waitProcs;
+	dclist_head *waitQueue = &lock->packed.waitProcs;
 	LOCKMASK	aheadRequests = 0;
 	dlist_mutable_iter miter;
 
-- 
2.45.2

v0-0001-Reduce-size-of-LOCK-by-16-bytes.patchapplication/octet-stream; name=v0-0001-Reduce-size-of-LOCK-by-16-bytes.patchDownload
From 1abaf5c17ddb92a14bd20afc0e43ad3fc21a8475 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Wed, 20 Nov 2024 03:27:32 +0100
Subject: [PATCH v0 1/4] Reduce size of LOCK by 16 bytes

We only ever need 8 lockmodes, rather than 10, as NoLock is never registered,
and MaxLockmode is AccessExclusiveLock (8).

Also adjust various locations where we assume MAX_LOCKMODES > MaxLockMode.
---
 src/include/storage/lock.h           |  4 +-
 src/backend/access/common/relation.c |  6 +--
 src/backend/access/index/indexam.c   |  2 +-
 src/backend/storage/lmgr/README      | 15 +++---
 src/backend/storage/lmgr/lock.c      | 76 ++++++++++++++++------------
 src/backend/utils/adt/lockfuncs.c    |  2 +-
 6 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 787f3db06a..b2523bf79d 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -79,11 +79,13 @@ typedef struct
 		 (vxid_dst).localTransactionId = (proc).vxid.lxid)
 
 /* MAX_LOCKMODES cannot be larger than the # of bits in LOCKMASK */
-#define MAX_LOCKMODES		10
+#define MAX_LOCKMODES		MaxLockMode
 
 #define LOCKBIT_ON(lockmode) (1 << (lockmode))
 #define LOCKBIT_OFF(lockmode) (~(1 << (lockmode)))
 
+#define LOCK_VALID_LOCKMODE(lockmode) \
+	((lockmode) > NoLock && (lockmode) <= MaxLockMode)
 
 /*
  * This data structure defines the locking semantics associated with a
diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c
index d8a313a2c9..78e58d13ce 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -48,7 +48,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 {
 	Relation	r;
 
-	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+	Assert(lockmode >= NoLock && lockmode <= MAX_LOCKMODES);
 
 	/* Get the lock before trying to open the relcache entry */
 	if (lockmode != NoLock)
@@ -89,7 +89,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 {
 	Relation	r;
 
-	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+	Assert(lockmode >= NoLock && lockmode <= MAX_LOCKMODES);
 
 	/* Get the lock first */
 	if (lockmode != NoLock)
@@ -206,7 +206,7 @@ relation_close(Relation relation, LOCKMODE lockmode)
 {
 	LockRelId	relid = relation->rd_lockInfo.lockRelId;
 
-	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+	Assert(lockmode >= NoLock && lockmode <= MAX_LOCKMODES);
 
 	/* The relcache does the real work... */
 	RelationClose(relation);
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 1859be614c..70b9ac120e 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -178,7 +178,7 @@ index_close(Relation relation, LOCKMODE lockmode)
 {
 	LockRelId	relid = relation->rd_lockInfo.lockRelId;
 
-	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+	Assert(lockmode >= NoLock && lockmode <= MAX_LOCKMODES);
 
 	/* The relcache does the real work... */
 	RelationClose(relation);
diff --git a/src/backend/storage/lmgr/README b/src/backend/storage/lmgr/README
index 45de0fd2bd..f85a0d6020 100644
--- a/src/backend/storage/lmgr/README
+++ b/src/backend/storage/lmgr/README
@@ -105,7 +105,7 @@ grantMask -
     table) to determine if a new lock request will conflict with existing
     lock types held.  Conflicts are determined by bitwise AND operations
     between the grantMask and the conflict table entry for the requested
-    lock type.  Bit i of grantMask is 1 if and only if granted[i] > 0.
+    lock type.  Bit i of grantMask is 1 if and only if granted[i - 1] > 0.
 
 waitMask -
     This bitmask shows the types of locks being waited for.  Bit i of waitMask
@@ -133,10 +133,10 @@ nRequested -
     only in the backend's LOCALLOCK structure.)
 
 requested -
-    Keeps a count of how many locks of each type have been attempted.  Only
-    elements 1 through MAX_LOCKMODES-1 are used as they correspond to the lock
-    type defined constants.  Summing the values of requested[] should come out
-    equal to nRequested.
+    Keeps a count of how many locks of each type have been attempted.
+    Elements 0 through MAX_LOCKMODES (inclusive) are used, and correspond to
+    lock type definded constants with value elem+1.  Summing the values of
+    requested[] should come out equal to nRequested.
 
 nGranted -
     Keeps count of how many times this lock has been successfully acquired.
@@ -145,9 +145,8 @@ nGranted -
 
 granted -
     Keeps count of how many locks of each type are currently held.  Once again
-    only elements 1 through MAX_LOCKMODES-1 are used (0 is not).  Also, like
-    requested[], summing the values of granted[] should total to the value
-    of nGranted.
+    only elements 0 through MAX_LOCKMODES are used.  Also, like requested[],
+    summing the values of granted[] should total to the value of nGranted.
 
 We should always have 0 <= nGranted <= nRequested, and
 0 <= granted[i] <= requested[i] for each i.  When all the request counts
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index edc5020c6a..9bf6fbf976 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -367,19 +367,19 @@ LOCK_PRINT(const char *where, const LOCK *lock, LOCKMODE type)
 	if (LOCK_DEBUG_ENABLED(&lock->tag))
 		elog(LOG,
 			 "%s: lock(%p) id(%u,%u,%u,%u,%u,%u) grantMask(%x) "
-			 "req(%d,%d,%d,%d,%d,%d,%d)=%d "
-			 "grant(%d,%d,%d,%d,%d,%d,%d)=%d wait(%d) type(%s)",
+			 "req(%d,%d,%d,%d,%d,%d,%d,%d)=%d "
+			 "grant(%d,%d,%d,%d,%d,%d,%d,%d)=%d wait(%d) type(%s)",
 			 where, lock,
 			 lock->tag.locktag_field1, lock->tag.locktag_field2,
 			 lock->tag.locktag_field3, lock->tag.locktag_field4,
 			 lock->tag.locktag_type, lock->tag.locktag_lockmethodid,
 			 lock->grantMask,
-			 lock->requested[1], lock->requested[2], lock->requested[3],
-			 lock->requested[4], lock->requested[5], lock->requested[6],
-			 lock->requested[7], lock->nRequested,
-			 lock->granted[1], lock->granted[2], lock->granted[3],
-			 lock->granted[4], lock->granted[5], lock->granted[6],
-			 lock->granted[7], lock->nGranted,
+			 lock->requested[0], lock->requested[1], lock->requested[2],
+			 lock->requested[3], lock->requested[4], lock->requested[5],
+			 lock->requested[6], lock->requested[7], lock->nRequested,
+			 lock->granted[0], lock->granted[1], lock->granted[2],
+			 lock->granted[3], lock->granted[4], lock->granted[5],
+			 lock->granted[6], lock->granted[7], lock->nGranted,
 			 dclist_count(&lock->waitProcs),
 			 LockMethods[LOCK_LOCKMETHOD(*lock)]->lockModeNames[type]);
 }
@@ -704,6 +704,8 @@ LockHasWaiters(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	if (lockmode <= 0 || lockmode > lockMethodTable->numLockModes)
 		elog(ERROR, "unrecognized lock mode: %d", lockmode);
 
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
+
 #ifdef LOCK_DEBUG
 	if (LOCK_DEBUG_ENABLED(locktag))
 		elog(LOG, "LockHasWaiters: lock [%u,%u] %s",
@@ -851,6 +853,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	if (lockmode <= 0 || lockmode > lockMethodTable->numLockModes)
 		elog(ERROR, "unrecognized lock mode: %d", lockmode);
 
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
+
 	if (RecoveryInProgress() && !InRecovery &&
 		(locktag->locktag_type == LOCKTAG_OBJECT ||
 		 locktag->locktag_type == LOCKTAG_RELATION) &&
@@ -1133,11 +1137,11 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		else
 			PROCLOCK_PRINT("LockAcquire: did not join wait queue", proclock);
 		lock->nRequested--;
-		lock->requested[lockmode]--;
+		lock->requested[lockmode - 1]--;
 		LOCK_PRINT("LockAcquire: did not join wait queue",
 				   lock, lockmode);
 		Assert((lock->nRequested > 0) &&
-			   (lock->requested[lockmode] >= 0));
+			   (lock->requested[lockmode - 1] >= 0));
 		Assert(lock->nGranted <= lock->nRequested);
 		LWLockRelease(partitionLock);
 		if (locallock->nLocks == 0)
@@ -1237,6 +1241,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	PROCLOCKTAG proclocktag;
 	uint32		proclock_hashcode;
 	bool		found;
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
 
 	/*
 	 * Find or create a lock with this tag.
@@ -1267,8 +1272,8 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	else
 	{
 		LOCK_PRINT("LockAcquire: found", lock, lockmode);
-		Assert((lock->nRequested >= 0) && (lock->requested[lockmode] >= 0));
-		Assert((lock->nGranted >= 0) && (lock->granted[lockmode] >= 0));
+		Assert((lock->nRequested >= 0) && (lock->requested[lockmode - 1] >= 0));
+		Assert((lock->nGranted >= 0) && (lock->granted[lockmode - 1] >= 0));
 		Assert(lock->nGranted <= lock->nRequested);
 	}
 
@@ -1385,8 +1390,8 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	 * The other counts don't increment till we get the lock.
 	 */
 	lock->nRequested++;
-	lock->requested[lockmode]++;
-	Assert((lock->nRequested > 0) && (lock->requested[lockmode] > 0));
+	lock->requested[lockmode - 1]++;
+	Assert((lock->nRequested > 0) && (lock->requested[lockmode - 1] > 0));
 
 	/*
 	 * We shouldn't already hold the desired lock; else locallock table is
@@ -1483,7 +1488,7 @@ LockCheckConflicts(LockMethod lockMethodTable,
 	int			numLockModes = lockMethodTable->numLockModes;
 	LOCKMASK	myLocks;
 	int			conflictMask = lockMethodTable->conflictTab[lockmode];
-	int			conflictsRemaining[MAX_LOCKMODES];
+	int			conflictsRemaining[MAX_LOCKMODES + 1];
 	int			totalConflictsRemaining = 0;
 	dlist_iter	proclock_iter;
 	int			i;
@@ -1516,7 +1521,7 @@ LockCheckConflicts(LockMethod lockMethodTable,
 			conflictsRemaining[i] = 0;
 			continue;
 		}
-		conflictsRemaining[i] = lock->granted[i];
+		conflictsRemaining[i] = lock->granted[i - 1];
 		if (myLocks & LOCKBIT_ON(i))
 			--conflictsRemaining[i];
 		totalConflictsRemaining += conflictsRemaining[i];
@@ -1606,14 +1611,16 @@ LockCheckConflicts(LockMethod lockMethodTable,
 void
 GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode)
 {
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
+
 	lock->nGranted++;
-	lock->granted[lockmode]++;
+	lock->granted[lockmode - 1]++;
 	lock->grantMask |= LOCKBIT_ON(lockmode);
-	if (lock->granted[lockmode] == lock->requested[lockmode])
+	if (lock->granted[lockmode - 1] == lock->requested[lockmode - 1])
 		lock->waitMask &= LOCKBIT_OFF(lockmode);
 	proclock->holdMask |= LOCKBIT_ON(lockmode);
 	LOCK_PRINT("GrantLock", lock, lockmode);
-	Assert((lock->nGranted > 0) && (lock->granted[lockmode] > 0));
+	Assert((lock->nGranted > 0) && (lock->granted[lockmode - 1] > 0));
 	Assert(lock->nGranted <= lock->nRequested);
 }
 
@@ -1631,20 +1638,21 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 			PROCLOCK *proclock, LockMethod lockMethodTable)
 {
 	bool		wakeupNeeded = false;
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
 
-	Assert((lock->nRequested > 0) && (lock->requested[lockmode] > 0));
-	Assert((lock->nGranted > 0) && (lock->granted[lockmode] > 0));
+	Assert((lock->nRequested > 0) && (lock->requested[lockmode - 1] > 0));
+	Assert((lock->nGranted > 0) && (lock->granted[lockmode - 1] > 0));
 	Assert(lock->nGranted <= lock->nRequested);
 
 	/*
 	 * fix the general lock stats
 	 */
 	lock->nRequested--;
-	lock->requested[lockmode]--;
+	lock->requested[lockmode - 1]--;
 	lock->nGranted--;
-	lock->granted[lockmode]--;
+	lock->granted[lockmode - 1]--;
 
-	if (lock->granted[lockmode] == 0)
+	if (lock->granted[lockmode - 1] == 0)
 	{
 		/* change the conflict mask.  No more of this lock type. */
 		lock->grantMask &= LOCKBIT_OFF(lockmode);
@@ -1656,7 +1664,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 	 * We need only run ProcLockWakeup if the released lock conflicts with at
 	 * least one of the lock types requested by waiter(s).  Otherwise whatever
 	 * conflict made them wait must still exist.  NOTE: before MVCC, we could
-	 * skip wakeup if lock->granted[lockmode] was still positive. But that's
+	 * skip wakeup if lock->granted[lockmode - 1] was still positive. But that's
 	 * not true anymore, because the remaining granted locks might belong to
 	 * some waiter, who could now be awakened because he doesn't conflict with
 	 * his own locks.
@@ -1973,10 +1981,10 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
 	Assert(waitLock->nRequested > 0);
 	Assert(waitLock->nRequested > proc->waitLock->nGranted);
 	waitLock->nRequested--;
-	Assert(waitLock->requested[lockmode] > 0);
-	waitLock->requested[lockmode]--;
+	Assert(waitLock->requested[lockmode - 1] > 0);
+	waitLock->requested[lockmode - 1]--;
 	/* don't forget to clear waitMask bit if appropriate */
-	if (waitLock->granted[lockmode] == waitLock->requested[lockmode])
+	if (waitLock->granted[lockmode - 1] == waitLock->requested[lockmode - 1])
 		waitLock->waitMask &= LOCKBIT_OFF(lockmode);
 
 	/* Clean up the proc's own state, and pass it the ok/fail signal */
@@ -2025,6 +2033,8 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	if (lockmode <= 0 || lockmode > lockMethodTable->numLockModes)
 		elog(ERROR, "unrecognized lock mode: %d", lockmode);
 
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
+
 #ifdef LOCK_DEBUG
 	if (LOCK_DEBUG_ENABLED(locktag))
 		elog(LOG, "LockRelease: lock [%u,%u] %s",
@@ -4311,6 +4321,8 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 				 errhint("You might need to increase \"%s\".", "max_locks_per_transaction")));
 	}
 
+	Assert(LOCK_VALID_LOCKMODE(lockmode));
+
 	/*
 	 * if it's a new lock object, initialize it
 	 */
@@ -4329,8 +4341,8 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	else
 	{
 		LOCK_PRINT("lock_twophase_recover: found", lock, lockmode);
-		Assert((lock->nRequested >= 0) && (lock->requested[lockmode] >= 0));
-		Assert((lock->nGranted >= 0) && (lock->granted[lockmode] >= 0));
+		Assert((lock->nRequested >= 0) && (lock->requested[lockmode - 1] >= 0));
+		Assert((lock->nGranted >= 0) && (lock->granted[lockmode - 1] >= 0));
 		Assert(lock->nGranted <= lock->nRequested);
 	}
 
@@ -4402,8 +4414,8 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	 * requests, whether granted or waiting, so increment those immediately.
 	 */
 	lock->nRequested++;
-	lock->requested[lockmode]++;
-	Assert((lock->nRequested > 0) && (lock->requested[lockmode] > 0));
+	lock->requested[lockmode - 1]++;
+	Assert((lock->nRequested > 0) && (lock->requested[lockmode - 1] > 0));
 
 	/*
 	 * We shouldn't already hold the desired lock.
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index e790f856ab..f21f348464 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -189,7 +189,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		granted = false;
 		if (instance->holdMask)
 		{
-			for (mode = 0; mode < MAX_LOCKMODES; mode++)
+			for (mode = 1; mode <= MAX_LOCKMODES; mode++)
 			{
 				if (instance->holdMask & LOCKBIT_ON(mode))
 				{
-- 
2.45.2

v0-0003-Reduce-size-of-PROCLOCK-by-8-bytes-on-64-bit-syst.patchapplication/octet-stream; name=v0-0003-Reduce-size-of-PROCLOCK-by-8-bytes-on-64-bit-syst.patchDownload
From f75f69379809f796bd8959d9d7e7d38277cacdd3 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Wed, 20 Nov 2024 04:07:28 +0100
Subject: [PATCH v0 3/4] Reduce size of PROCLOCK by 8 bytes on 64-bit systems

---
 src/include/storage/lock.h      |  2 +-
 src/backend/storage/lmgr/lock.c | 16 +++++++++-------
 src/backend/storage/lmgr/proc.c |  5 ++++-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 345ded934f..b1d4f18402 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -386,7 +386,7 @@ typedef struct PROCLOCK
 	PROCLOCKTAG tag;			/* unique identifier of proclock object */
 
 	/* data */
-	PGPROC	   *groupLeader;	/* proc's lock group leader, or proc itself */
+	ProcNumber	groupLeader;	/* proc's lock group leader, or proc itself */
 	LOCKMASK	holdMask;		/* bitmask for lock types currently held */
 	LOCKMASK	releaseMask;	/* bitmask for lock types to be released */
 	dlist_node	lockLink;		/* list link in LOCK's list of proclocks */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 65349b1196..15e1512e75 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1321,6 +1321,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 	if (!found)
 	{
 		uint32		partition = LockHashPartition(hashcode);
+		PGPROC	   *leaderProc;
 
 		/*
 		 * It might seem unsafe to access proclock->groupLeader without a
@@ -1332,8 +1333,9 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 		 * lock group leader without first releasing all of its locks (and in
 		 * particular the one we are currently transferring).
 		 */
-		proclock->groupLeader = proc->lockGroupLeader != NULL ?
+		leaderProc = proc->lockGroupLeader != NULL ?
 			proc->lockGroupLeader : proc;
+		proclock->groupLeader = GetNumberFromPGProc(leaderProc);
 		proclock->holdMask = 0;
 		proclock->releaseMask = 0;
 		/* Add proclock to appropriate lists */
@@ -1535,7 +1537,7 @@ LockCheckConflicts(LockMethod lockMethodTable,
 	}
 
 	/* If no group locking, it's definitely a conflict. */
-	if (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)
+	if (proclock->groupLeader == MyProcNumber && MyProc->lockGroupLeader == NULL)
 	{
 		Assert(proclock->tag.myProc == MyProc);
 		PROCLOCK_PRINT("LockCheckConflicts: conflicting (simple)",
@@ -3640,8 +3642,8 @@ PostPrepare_Locks(TransactionId xid)
 			 * Update groupLeader pointer to point to the new proc.  (We'd
 			 * better not be a member of somebody else's lock group!)
 			 */
-			Assert(proclock->groupLeader == proclock->tag.myProc);
-			proclock->groupLeader = newproc;
+			Assert(proclock->groupLeader == GetNumberFromPGProc(proclock->tag.myProc));
+			proclock->groupLeader = GetNumberFromPGProc(newproc);
 
 			/*
 			 * Update the proclock.  We should not find any existing entry for
@@ -3864,7 +3866,7 @@ GetLockStatusData(void)
 		instance->vxid.procNumber = proc->vxid.procNumber;
 		instance->vxid.localTransactionId = proc->vxid.lxid;
 		instance->pid = proc->pid;
-		instance->leaderPid = proclock->groupLeader->pid;
+		instance->leaderPid = GetPGProcByNumber(proclock->groupLeader)->pid;
 		instance->fastpath = false;
 		instance->waitStart = (TimestampTz) pg_atomic_read_u64(&proc->waitStart);
 
@@ -4040,7 +4042,7 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data)
 		instance->vxid.procNumber = proc->vxid.procNumber;
 		instance->vxid.localTransactionId = proc->vxid.lxid;
 		instance->pid = proc->pid;
-		instance->leaderPid = proclock->groupLeader->pid;
+		instance->leaderPid = GetPGProcByNumber(proclock->groupLeader)->pid;
 		instance->fastpath = false;
 		data->nlocks++;
 	}
@@ -4394,7 +4396,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 	if (!found)
 	{
 		Assert(proc->lockGroupLeader == NULL);
-		proclock->groupLeader = proc;
+		proclock->groupLeader = GetNumberFromPGProc(proc);
 		proclock->holdMask = 0;
 		proclock->releaseMask = 0;
 		/* Add proclock to appropriate lists */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index a359c0be21..43e79e07e2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1119,6 +1119,9 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	if (leader != NULL)
 	{
 		dlist_iter	iter;
+		ProcNumber	leaderNo;
+
+		leaderNo = GetNumberFromPGProc(leader);
 
 		dlist_foreach(iter, &lock->procLocks)
 		{
@@ -1126,7 +1129,7 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 
 			otherproclock = dlist_container(PROCLOCK, lockLink, iter.cur);
 
-			if (otherproclock->groupLeader == leader)
+			if (otherproclock->groupLeader == leaderNo)
 				myHeldLocks |= otherproclock->holdMask;
 		}
 	}
-- 
2.45.2

v0-0004-Reduce-PROCLOCK-hash-table-size.patchapplication/octet-stream; name=v0-0004-Reduce-PROCLOCK-hash-table-size.patchDownload
From 118cb90d4273f6cf84b98a4f9c9f66325bd827e2 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Wed, 20 Nov 2024 17:31:26 +0100
Subject: [PATCH v0 4/4] Reduce PROCLOCK hash table size

This reduces the memory usage of the heavyweight lock mechanism by 25%.

Because the main LOCK table is already sized for max_locks_per_transaction
for every backend, further allocation of more PROCLOCK entries doesn't make
sense, as those would let the average number of locked objects per backend
to increase past max_locks_per_transaction.
---
 src/backend/storage/lmgr/lock.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 15e1512e75..bd3e4d0f20 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -438,16 +438,18 @@ void
 LockManagerShmemInit(void)
 {
 	HASHCTL		info;
-	long		init_table_size,
-				max_table_size;
+	long		max_table_size;
 	bool		found;
+	Size		allocated;
+	char	   *start;
+	char	   *end;
 
+	start = ShmemAllocNoError(0);
 	/*
 	 * Compute init/max size to request for lock hashtables.  Note these
 	 * calculations must agree with LockManagerShmemSize!
 	 */
 	max_table_size = NLOCKENTS();
-	init_table_size = max_table_size / 2;
 
 	/*
 	 * Allocate hash table for LOCK structs.  This stores per-locked-object
@@ -458,14 +460,17 @@ LockManagerShmemInit(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodLockHash = ShmemInitHash("LOCK hash",
-									   init_table_size,
+									   max_table_size,
 									   max_table_size,
 									   &info,
 									   HASH_ELEM | HASH_BLOBS | HASH_PARTITION);
 
-	/* Assume an average of 2 holders per lock */
-	max_table_size *= 2;
-	init_table_size *= 2;
+	/*
+	 * Assume every proc has max_locks_per_transaction locks. We don't
+	 * need more PROCLOCK entries after that, because we can't acquire more
+	 * locks after that. This is also more consistent with the advertised
+	 * behaviour of max_locks_per_transaction.
+	 */
 
 	/*
 	 * Allocate hash table for PROCLOCK structs.  This stores
@@ -477,7 +482,7 @@ LockManagerShmemInit(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodProcLockHash = ShmemInitHash("PROCLOCK hash",
-										   init_table_size,
+										   max_table_size,
 										   max_table_size,
 										   &info,
 										   HASH_ELEM | HASH_FUNCTION | HASH_PARTITION);
@@ -490,6 +495,13 @@ LockManagerShmemInit(void)
 						sizeof(FastPathStrongRelationLockData), &found);
 	if (!found)
 		SpinLockInit(&FastPathStrongRelationLocks->mutex);
+	end = ShmemAllocNoError(0);
+
+	allocated = end - start;
+	elog(LOG, "Lock ShMem: Allocated %lu for NLOCKENTS=%ld (= %d mlpxid * (%d m_c + %d m_p_xids))",
+		 (unsigned long) allocated, max_table_size,
+		 max_locks_per_xact, MaxBackends, max_prepared_xacts
+	);
 }
 
 /*
@@ -3682,7 +3694,6 @@ LockManagerShmemSize(void)
 	size = add_size(size, hash_estimate_size(max_table_size, sizeof(LOCK)));
 
 	/* proclock hash table */
-	max_table_size *= 2;
 	size = add_size(size, hash_estimate_size(max_table_size, sizeof(PROCLOCK)));
 
 	/*
-- 
2.45.2

#42Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#32)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant
while looking like a constant:

#define FP_LOCK_SLOTS_PER_BACKEND (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

I don't think it's a good idea to have non-function-like #defines that
reference variables that can change from run to run.

Greetings,

Andres Freund

#43Tomas Vondra
tomas@vondra.me
In reply to: Andres Freund (#42)
Re: scalability bottlenecks with (many) partitions (and more)

On 3/3/25 19:10, Andres Freund wrote:

Hi,

On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant
while looking like a constant:

#define FP_LOCK_SLOTS_PER_BACKEND (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

I don't think it's a good idea to have non-function-like #defines that
reference variables that can change from run to run.

Fair point, although it can't change "run to run" - not without a
restart. It's not a proper constant, of course, but it seemed close
enough. Yes, it might confuse people into thinking it's a constant, or
is there some additional impact?

The one fix I can think of is making it look more like a function,
possibly just like this:

#define FastPathLockSlotsPerBackend() \
(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

Or do you have another suggestion?

regards

--
Tomas Vondra

#44Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#43)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 2025-03-03 21:31:42 +0100, Tomas Vondra wrote:

On 3/3/25 19:10, Andres Freund wrote:

On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant
while looking like a constant:

#define FP_LOCK_SLOTS_PER_BACKEND (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

I don't think it's a good idea to have non-function-like #defines that
reference variables that can change from run to run.

Fair point, although it can't change "run to run" - not without a
restart.

That's what I meant with "run to run".

It's not a proper constant, of course, but it seemed close
enough. Yes, it might confuse people into thinking it's a constant, or
is there some additional impact?

That seems plenty. I just looked at the shem sizing function and was confused
because I didn't see where the max_locks_per_transaction affects the
allocation size.

The one fix I can think of is making it look more like a function,
possibly just like this:

#define FastPathLockSlotsPerBackend() \
(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

Or do you have another suggestion?

That'd work for me.

Greetings,

Andres Freund

#45Tomas Vondra
tomas@vondra.me
In reply to: Andres Freund (#44)
1 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On 3/3/25 21:52, Andres Freund wrote:

Hi,

On 2025-03-03 21:31:42 +0100, Tomas Vondra wrote:

On 3/3/25 19:10, Andres Freund wrote:

On 2024-09-21 20:33:49 +0200, Tomas Vondra wrote:

I've finally pushed this, after many rounds of careful testing to ensure
no regressions, and polishing.

One minor nit: I don't like that FP_LOCK_SLOTS_PER_BACKEND is now non-constant
while looking like a constant:

#define FP_LOCK_SLOTS_PER_BACKEND (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

I don't think it's a good idea to have non-function-like #defines that
reference variables that can change from run to run.

Fair point, although it can't change "run to run" - not without a
restart.

That's what I meant with "run to run".

OK.

It's not a proper constant, of course, but it seemed close
enough. Yes, it might confuse people into thinking it's a constant, or
is there some additional impact?

That seems plenty. I just looked at the shem sizing function and was confused
because I didn't see where the max_locks_per_transaction affects the
allocation size.

But the shmem sizing doesn't use FP_LOCK_SLOTS_PER_BACKEND at all, both
proc.c and postinit.c use the "full" formula, not the macro

FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP

so why would the macro make this bit less obvious?

The one fix I can think of is making it look more like a function,
possibly just like this:

#define FastPathLockSlotsPerBackend() \
(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

Or do you have another suggestion?

That'd work for me.

Attached is a patch doing this, but considering it has nothing to do
with the shmem sizing, I wonder if it's worth it.

regards

--
Tomas Vondra

Attachments:

fast-path-macro-fix.patchtext/x-patch; charset=UTF-8; name=fast-path-macro-fix.patchDownload
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 11b4d1085bb..ccfe6b69bf5 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -226,10 +226,10 @@ int			FastPathLockGroupsPerBackend = 0;
  * the FAST_PATH_SLOT macro, split it into group and index (in the group).
  */
 #define FAST_PATH_GROUP(index)	\
-	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
+	(AssertMacro((uint32) (index) < FastPathLockSlotsPerBackend()), \
 	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
 #define FAST_PATH_INDEX(index)	\
-	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
+	(AssertMacro((uint32) (index) < FastPathLockSlotsPerBackend()), \
 	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
 
 /* Macros for manipulating proc->fpLockBits */
@@ -242,7 +242,7 @@ int			FastPathLockGroupsPerBackend = 0;
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
-	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
+	 AssertMacro((n) < FastPathLockSlotsPerBackend()), \
 	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
 	 FAST_PATH_BITS(proc, n) |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
@@ -2691,7 +2691,7 @@ static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
 	uint32		i;
-	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		unused_slot = FastPathLockSlotsPerBackend();
 
 	/* fast-path group the lock belongs to */
 	uint32		group = FAST_PATH_REL_GROUP(relid);
@@ -2713,7 +2713,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	}
 
 	/* If no existing entry, use any empty slot. */
-	if (unused_slot < FP_LOCK_SLOTS_PER_BACKEND)
+	if (unused_slot < FastPathLockSlotsPerBackend())
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 20777f7d5ae..114eb1f8f76 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -88,7 +88,8 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
 
 #define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+#define		FastPathLockSlotsPerBackend() \
+	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
 
 /*
  * Flags for PGPROC.delayChkptFlags
#46Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#45)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 2025-03-04 14:05:22 +0100, Tomas Vondra wrote:

On 3/3/25 21:52, Andres Freund wrote:

It's not a proper constant, of course, but it seemed close
enough. Yes, it might confuse people into thinking it's a constant, or
is there some additional impact?

That seems plenty. I just looked at the shem sizing function and was confused
because I didn't see where the max_locks_per_transaction affects the
allocation size.

But the shmem sizing doesn't use FP_LOCK_SLOTS_PER_BACKEND at all, both
proc.c and postinit.c use the "full" formula, not the macro

Not sure what I brainfarted there...

The one fix I can think of is making it look more like a function,
possibly just like this:

#define FastPathLockSlotsPerBackend() \
(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

Or do you have another suggestion?

That'd work for me.

Attached is a patch doing this, but considering it has nothing to do
with the shmem sizing, I wonder if it's worth it.

Yes.

Greetings,

Andres Freund

#47Tomas Vondra
tomas@vondra.me
In reply to: Andres Freund (#46)
1 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On 3/4/25 14:11, Andres Freund wrote:

Hi,

On 2025-03-04 14:05:22 +0100, Tomas Vondra wrote:

On 3/3/25 21:52, Andres Freund wrote:

It's not a proper constant, of course, but it seemed close
enough. Yes, it might confuse people into thinking it's a constant, or
is there some additional impact?

That seems plenty. I just looked at the shem sizing function and was confused
because I didn't see where the max_locks_per_transaction affects the
allocation size.

But the shmem sizing doesn't use FP_LOCK_SLOTS_PER_BACKEND at all, both
proc.c and postinit.c use the "full" formula, not the macro

Not sure what I brainfarted there...

This got me thinking - maybe it'd be better to use the new
FastPathLockSlotsPerBackend() in all places that need the number of
slots per backend, including those in proc.c etc.? Arguably, these
places should have used FP_LOCK_SLOTS_PER_BACKEND before.

The attached v2 patch does that.

The one fix I can think of is making it look more like a function,
possibly just like this:

#define FastPathLockSlotsPerBackend() \
(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)

Or do you have another suggestion?

That'd work for me.

Attached is a patch doing this, but considering it has nothing to do
with the shmem sizing, I wonder if it's worth it.

Yes.

OK, barring objections I'll push the v2.

regards

--
Tomas Vondra

Attachments:

fast-path-macro-fix-v2.patchtext/x-patch; charset=UTF-8; name=fast-path-macro-fix-v2.patchDownload
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 11b4d1085bb..ccfe6b69bf5 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -226,10 +226,10 @@ int			FastPathLockGroupsPerBackend = 0;
  * the FAST_PATH_SLOT macro, split it into group and index (in the group).
  */
 #define FAST_PATH_GROUP(index)	\
-	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
+	(AssertMacro((uint32) (index) < FastPathLockSlotsPerBackend()), \
 	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
 #define FAST_PATH_INDEX(index)	\
-	(AssertMacro((uint32) (index) < FP_LOCK_SLOTS_PER_BACKEND), \
+	(AssertMacro((uint32) (index) < FastPathLockSlotsPerBackend()), \
 	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
 
 /* Macros for manipulating proc->fpLockBits */
@@ -242,7 +242,7 @@ int			FastPathLockGroupsPerBackend = 0;
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
-	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
+	 AssertMacro((n) < FastPathLockSlotsPerBackend()), \
 	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
 	 FAST_PATH_BITS(proc, n) |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
@@ -2691,7 +2691,7 @@ static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
 	uint32		i;
-	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		unused_slot = FastPathLockSlotsPerBackend();
 
 	/* fast-path group the lock belongs to */
 	uint32		group = FAST_PATH_REL_GROUP(relid);
@@ -2713,7 +2713,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	}
 
 	/* If no existing entry, use any empty slot. */
-	if (unused_slot < FP_LOCK_SLOTS_PER_BACKEND)
+	if (unused_slot < FastPathLockSlotsPerBackend())
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 49204f91a20..749a79d48ef 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -116,7 +116,7 @@ ProcGlobalShmemSize(void)
 	 * nicely aligned in each backend.
 	 */
 	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
-	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+	fpRelIdSize = MAXALIGN(FastPathLockSlotsPerBackend() * sizeof(Oid));
 
 	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
 
@@ -231,7 +231,7 @@ InitProcGlobal(void)
 	 * shared memory and then divide that between backends.
 	 */
 	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
-	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+	fpRelIdSize = MAXALIGN(FastPathLockSlotsPerBackend() * sizeof(Oid));
 
 	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
 	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 318600d6d02..763893eed8f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -586,7 +586,7 @@ InitializeFastPathLocks(void)
 	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
 	{
 		/* stop once we exceed max_locks_per_xact */
-		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+		if (FastPathLockSlotsPerBackend() >= max_locks_per_xact)
 			break;
 
 		FastPathLockGroupsPerBackend *= 2;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 20777f7d5ae..114eb1f8f76 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -88,7 +88,8 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
 
 #define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+#define		FastPathLockSlotsPerBackend() \
+	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
 
 /*
  * Flags for PGPROC.delayChkptFlags
#48Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#47)
Re: scalability bottlenecks with (many) partitions (and more)

On 3/4/25 15:38, Tomas Vondra wrote:

...

Attached is a patch doing this, but considering it has nothing to do
with the shmem sizing, I wonder if it's worth it.

Yes.

OK, barring objections I'll push the v2.

Pushed, with the tweaks to use FastPathLockSlotsPerBackend() in a couple
more places.

I noticed sifaka started failing right after I pushed this:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=sifaka&amp;br=master

But I have no idea why would this cosmetic change cause issues with LDAP
tests, so I'm assuming the failure is unrelated, and the timing is
accidental and not caused by the patch.

regards

--
Tomas Vondra

#49Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#48)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote:

Pushed, with the tweaks to use FastPathLockSlotsPerBackend() in a couple
more places.

Thanks!

I noticed sifaka started failing right after I pushed this:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=sifaka&amp;br=master

But I have no idea why would this cosmetic change cause issues with LDAP
tests, so I'm assuming the failure is unrelated, and the timing is
accidental and not caused by the patch.

The buildfarm was updated between those two runs.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka&amp;dt=2025-03-04%2015%3A01%3A42
has
'PGBuild::Log' => 'REL_18',
whereas the failing run
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka&amp;dt=2025-03-04%2017%3A35%3A40
has
'PGBuild::Log' => 'REL_19',

It's worth noting that
a) sifaka doesn't build with ldap support
b) the failure is in checkprep, not when running the tests
c) the buildfarm unfortunately doesn't archive install.log, so it's hard to
know what actually went wrong

Greetings,

Andres Freund

#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#49)
Re: scalability bottlenecks with (many) partitions (and more)

Andres Freund <andres@anarazel.de> writes:

On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote:

I noticed sifaka started failing right after I pushed this:

It's worth noting that
a) sifaka doesn't build with ldap support
b) the failure is in checkprep, not when running the tests
c) the buildfarm unfortunately doesn't archive install.log, so it's hard to
know what actually went wrong

Yeah, I've been poking at that. It's not at all clear why the
animal is trying to run src/test/modules/ldap_password_func
now when it didn't before. I've been through the diffs between
BF client 18 and 19 multiple times and nothing jumps out at me.

What's clear though is that it *is* trying to do "make check"
in that directory, and the link step blows up with

ccache clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Werror=unguarded-availability-new -Wendif-labels -Wmissing-format-attribute -Wcast-function-type -Wformat-security -Wmissing-variable-declarations -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -Wno-cast-function-type-strict -g -O2 -fvisibility=hidden -bundle -o ldap_password_func.dylib ldap_password_func.o -L../../../../src/port -L../../../../src/common -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX15.2.sdk -L/opt/local/libexec/llvm-17/lib -L/opt/local/lib -Wl,-dead_strip_dylibs -fvisibility=hidden -bundle_loader ../../../../src/backend/postgres
Undefined symbols for architecture arm64:
"_ldap_password_hook", referenced from:
__PG_init in ldap_password_func.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

That happens because

(a) ldap_password_hook is not defined unless USE_LDAP;

(b) macOS's linker is persnickety and reports the missing symbol
at shlib link time, not shlib load time.

Maybe we should rethink (a)? In the meantime I'm trying to hack
the script so it skips that test module, and finding out that
my Perl is rustier than I thought.

regards, tom lane

#51Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#50)
Re: scalability bottlenecks with (many) partitions (and more)

Hi,

On 2025-03-04 16:30:34 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2025-03-04 19:58:38 +0100, Tomas Vondra wrote:

I noticed sifaka started failing right after I pushed this:

It's worth noting that
a) sifaka doesn't build with ldap support
b) the failure is in checkprep, not when running the tests
c) the buildfarm unfortunately doesn't archive install.log, so it's hard to
know what actually went wrong

Yeah, I've been poking at that. It's not at all clear why the
animal is trying to run src/test/modules/ldap_password_func
now when it didn't before.

It did do so before as well, afaict:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=sifaka&amp;dt=2025-03-04%2015%3A01%3A42&amp;stg=module-ldap_password_func-check

It seems to me that the difference is that now checkprep is run, whereas
previously it wasn't.

Before:
/Library/Developer/CommandLineTools/usr/bin/make -C adt jsonpath_gram.h
make[3]: `jsonpath_gram.h' is up to date.
echo "# +++ tap check in src/test/modules/ldap_password_func +++" && rm -rf '/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func'/tmp_check && /bin/sh ../../../../config/install-sh -c -d '/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func'/tmp_check && cd . && TESTLOGDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func/tmp_check/log' TESTDATADIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func/tmp_check' PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/bin:/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func:$PATH" DYLD_LIBRARY_PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/lib:$DYLD_LIBRARY_PATH" INITDB_TEMPLATE='/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install/initdb-template PGPORT='65678' top_builddir='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func/../../../..' PG_REGRESS='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/ldap_password_func/../../../../src/test/regress/pg_regress' share_contrib_dir='/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/share/postgresql/contrib' /usr/bin/prove -I ../../../../src/test/perl/ -I . --timer t/*.pl
# +++ tap check in src/test/modules/ldap_password_func +++
[10:08:59] t/001_mutated_bindpasswd.pl .. skipped: LDAP not supported by this build
[10:08:59]

Now:
/Library/Developer/CommandLineTools/usr/bin/make -C adt jsonpath_gram.h
make[3]: `jsonpath_gram.h' is up to date.
rm -rf '/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install
/bin/sh ../../../../config/install-sh -c -d '/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install/log
/Library/Developer/CommandLineTools/usr/bin/make -C '../../../..' DESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install install >'/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install/log/install.log 2>&1
/Library/Developer/CommandLineTools/usr/bin/make -j1 checkprep >>'/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install/log/install.log 2>&1
make: *** [temp-install] Error 2
log files for step module-ldap_password_funcCheck:

Note during a normal build ldap_password_func shouldn't be entered:
# Test runs an LDAP server, so only run if ldap is in PG_TEST_EXTRA
ifeq ($(with_ldap),yes)
ifneq (,$(filter ldap,$(PG_TEST_EXTRA)))
SUBDIRS += ldap_password_func
else
ALWAYS_SUBDIRS += ldap_password_func
endif
else
ALWAYS_SUBDIRS += ldap_password_func
endif

Which leads me to suspect that the difference might be related to
NO_TEMP_INSTALL not being set while it previously was. Which then triggers the
module being built, whereas it previously wasn't.

Of course relying on NO_TEMP_INSTALL preventing this from being built isn't
exactly reliable...

Greetings,

Andres Freund

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#51)
Re: scalability bottlenecks with (many) partitions (and more)

Andres Freund <andres@anarazel.de> writes:

On 2025-03-04 16:30:34 -0500, Tom Lane wrote:

Yeah, I've been poking at that. It's not at all clear why the
animal is trying to run src/test/modules/ldap_password_func
now when it didn't before.

It did do so before as well, afaict:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=sifaka&amp;dt=2025-03-04%2015%3A01%3A42&amp;stg=module-ldap_password_func-check

It seems to me that the difference is that now checkprep is run, whereas
previously it wasn't.

Maybe, but still I don't see any changes in the BF client that'd
explain it. The animal's configuration hasn't changed either;
the only non-comment diff in its buildfarm.conf is

@@ -374,7 +376,7 @@

base_port => 5678,

-       modules => [qw(TestUpgrade TestDecoding)],
+       modules => [qw(TestUpgrade)],

# settings used by run_branches.pl
global => {

which I changed to follow the lead of build-farm.conf.sample.
But surely that wouldn't affect this!?

regards, tom lane

#53Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#52)
Re: scalability bottlenecks with (many) partitions (and more)

On 2025-03-04 Tu 5:01 PM, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2025-03-04 16:30:34 -0500, Tom Lane wrote:

Yeah, I've been poking at that. It's not at all clear why the
animal is trying to run src/test/modules/ldap_password_func
now when it didn't before.

It did do so before as well, afaict:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=sifaka&amp;dt=2025-03-04%2015%3A01%3A42&amp;stg=module-ldap_password_func-check
It seems to me that the difference is that now checkprep is run, whereas
previously it wasn't.

Maybe, but still I don't see any changes in the BF client that'd
explain it. The animal's configuration hasn't changed either;
the only non-comment diff in its buildfarm.conf is

@@ -374,7 +376,7 @@

base_port => 5678,

-       modules => [qw(TestUpgrade TestDecoding)],
+       modules => [qw(TestUpgrade)],

# settings used by run_branches.pl
global => {

which I changed to follow the lead of build-farm.conf.sample.
But surely that wouldn't affect this!?

I think I found a logic bug. Testing.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#53)
1 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

Andrew Dunstan <andrew@dunslane.net> writes:

I think I found a logic bug. Testing.

Not sure what you are looking at, but I was trying to fix it
by making the loop over test modules skip unbuilt modules,
borrowing the test you added in v19 to skip unbuilt contrib
modules. It's a little more complicated for the other modules
because some of them have no .c files to be built, and I could
not get that to work. I eventually concluded that there's
something wrong with the "scalar glob()" idiom you used.
A bit of googling suggested "grep -e, glob()" instead, and
that seems to work for me. sifaka seems happy with the
attached patch.

regards, tom lane

Attachments:

run-build-fix.patchtext/x-diff; charset=us-ascii; name=run-build-fix.patchDownload
--- run_build.pl~	2025-03-04 16:34:04.082252563 -0500
+++ run_build.pl	2025-03-04 16:35:25.967357487 -0500
@@ -2483,6 +2483,11 @@ sub run_misc_tests
 		my $testname = basename($testdir);
 		next if $testname =~ /ssl/ && !$using_ssl;
 		next unless -d "$testdir/t";
+
+		# can't test it if we haven't built it
+		next unless grep -e, glob("$testdir/*.o $testdir/*.obj")
+			or not grep -e, glob("$testdir/*.c");
+
 		next if $using_msvc && $testname eq 'pg_bsd_indent';
 		next unless step_wanted("module-$testname");
 		print time_str(), "running misc test module-$testname ...\n"
@@ -2496,7 +2501,7 @@ sub run_misc_tests
 		my $testname = basename($testdir);
 
 		# can't test it if we haven't built it
-		next unless scalar glob("$testdir/*.o $testdir/*.obj");
+		next unless grep -e, glob("$testdir/*.o $testdir/*.obj");
 
 		# skip sepgsql unless it's marked for testing
 		next if $testname eq 'sepgsql' && $ENV{PG_TEST_EXTRA} !~ /\bsepgsql\b/;
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#54)
Re: scalability bottlenecks with (many) partitions (and more)

Andrew Dunstan <andrew@dunslane.net> writes:

I think I found a logic bug. Testing.

Oh! I bet you are looking at this 18-to-19 diff:

@@ -416,7 +416,8 @@ sub check_install_is_complete
 	{
 		$tmp_loc = "$tmp_loc/$install_dir";
 		$bindir = "$tmp_loc/bin";
-		$libdir = "$tmp_loc/lib/postgresql";
+		$libdir = "$tmp_loc/lib";
+		$libdir .= '/postgresql' unless $libdir =~ /postgres|pgsql/;
 		return (-d $bindir && -d $libdir);
 	}
 	elsif (-e "$build_dir/src/Makefile.global")    # i.e. not msvc
@@ -427,7 +428,8 @@ sub check_install_is_complete
 		chomp $suffix;
 		$tmp_loc = "$tmp_loc/$install_dir";
 		$bindir = "$tmp_loc/bin";
-		$libdir = "$tmp_loc/lib/postgresql";
+		$libdir = "$tmp_loc/lib";
+		$libdir .= '/postgresql' unless $libdir =~ /postgres|pgsql/;
 	}

I'd dismissed that because sifaka isn't running in a directory
that has "postgres" or "pgsql" in its path, but just now I looked
at the logs of one of these steps, and guess where it's installing:

/usr/bin/make -C '../../../..' DESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install install >'/Users/buildfarm/bf-data/HEAD/pgsql.build'/tmp_install/log/install.log 2>&1

I bet the "pgsql.build" name is confusing it into doing extra
installs. This'd explain the impression I had that the test steps
were running a bit slower than they ought to. If you check
sifaka's just-posted green run against its history, that run took
13:48 versus recent times of 10:35 or thereabouts, so we're definitely
eating a good deal of time someplace...

regards, tom lane

#56Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#54)
1 attachment(s)
Re: scalability bottlenecks with (many) partitions (and more)

On 2025-03-04 Tu 5:28 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

I think I found a logic bug. Testing.

Not sure what you are looking at, but I was trying to fix it
by making the loop over test modules skip unbuilt modules,
borrowing the test you added in v19 to skip unbuilt contrib
modules. It's a little more complicated for the other modules
because some of them have no .c files to be built, and I could
not get that to work. I eventually concluded that there's
something wrong with the "scalar glob()" idiom you used.
A bit of googling suggested "grep -e, glob()" instead, and
that seems to work for me. sifaka seems happy with the
attached patch.

I'm looking at something else, namely the attached.

Will check your patch out too.

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Attachments:

bfdirfix.patchtext/x-patch; charset=UTF-8; name=bfdirfix.patchDownload
diff --git a/PGBuild/Utils.pm b/PGBuild/Utils.pm
index b97de92..d96dcec 100644
--- a/PGBuild/Utils.pm
+++ b/PGBuild/Utils.pm
@@ -417,7 +417,7 @@ sub check_install_is_complete
 		$tmp_loc = "$tmp_loc/$install_dir";
 		$bindir = "$tmp_loc/bin";
 		$libdir = "$tmp_loc/lib";
-		$libdir .= '/postgresql' unless $libdir =~ /postgres|pgsql/;
+		$libdir .= '/postgresql' unless $install_dir =~ /postgres|pgsql/;
 		return (-d $bindir && -d $libdir);
 	}
 	elsif (-e "$build_dir/src/Makefile.global")    # i.e. not msvc
@@ -429,7 +429,7 @@ sub check_install_is_complete
 		$tmp_loc = "$tmp_loc/$install_dir";
 		$bindir = "$tmp_loc/bin";
 		$libdir = "$tmp_loc/lib";
-		$libdir .= '/postgresql' unless $libdir =~ /postgres|pgsql/;
+		$libdir .= '/postgresql' unless $install_dir =~ /postgres|pgsql/;
 	}
 
 	# these files should be present if we've temp_installed everything,
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#56)
Re: scalability bottlenecks with (many) partitions (and more)

Andrew Dunstan <andrew@dunslane.net> writes:

Will check your patch out too.

Comparing previous run against current, I now see that my patch
caused it to skip these steps:

module-ldap_password_func-check
module-pg_bsd_indent-check
contrib-sepgsql-check

Skipping the ldap and sepgsql tests is desirable, but it shouldn't
have skipped pg_bsd_indent. I think the cause of that is that
src/tools/pg_bsd_indent isn't built in any of the previous build
steps. Up to now it got built as a side-effect of invoking the
tests, which isn't great because any build errors/warnings disappear
into the install log which the script doesn't capture. I agree
with not capturing the install log, because that's generally
uninteresting once we get past make-install; but we have to be sure
that everything gets built before that.

regards, tom lane

#58Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#54)
Re: scalability bottlenecks with (many) partitions (and more)

On 2025-03-04 Tu 5:28 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I think I found a logic bug. Testing.

Not sure what you are looking at, but I was trying to fix it
by making the loop over test modules skip unbuilt modules,
borrowing the test you added in v19 to skip unbuilt contrib
modules. It's a little more complicated for the other modules
because some of them have no .c files to be built, and I could
not get that to work. I eventually concluded that there's
something wrong with the "scalar glob()" idiom you used.
A bit of googling suggested "grep -e, glob()" instead, and
that seems to work for me. sifaka seems happy with the
attached patch.

Well, in scalar context it should give us back the first item found, or
undef if nothing is found, AIUI.

But you're right, it might read better if I use a different formulation.

I didn't much like this, though:

+
+        # can't test it if we haven't built it
+        next unless grep -e, glob("$testdir/*.o $testdir/*.obj")
+            or not grep -e, glob("$testdir/*.c");
+

Too many negatives makes my head hurt.

I also note you said in a later email there were issues.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#56)
Re: scalability bottlenecks with (many) partitions (and more)

Andrew Dunstan <andrew@dunslane.net> writes:

I'm looking at something else, namely the attached.

Yeah, that avoids the extra installs and brings sifaka's
runtime back to about what it had been.

regards, tom lane

#60Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#57)
Re: scalability bottlenecks with (many) partitions (and more)

On 2025-03-04 Tu 6:04 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

Will check your patch out too.

Comparing previous run against current, I now see that my patch
caused it to skip these steps:

module-ldap_password_func-check
module-pg_bsd_indent-check
contrib-sepgsql-check

Skipping the ldap and sepgsql tests is desirable, but it shouldn't
have skipped pg_bsd_indent. I think the cause of that is that
src/tools/pg_bsd_indent isn't built in any of the previous build
steps. Up to now it got built as a side-effect of invoking the
tests, which isn't great because any build errors/warnings disappear
into the install log which the script doesn't capture. I agree
with not capturing the install log, because that's generally
uninteresting once we get past make-install; but we have to be sure
that everything gets built before that.

Yeah ... I think an easy fix is to put this in make_testmodules():

+
+       # build pg_bsd_indent at the same time
+       # this doesn't really belong here, but it's convenient
+       if (-d "$pgsql/src/tools/pg_bsd_indent" && !$status)
+       {
+               my @indentout = run_log("cd 
$pgsql/src/tools/pg_bsd_indent && $make_cmd");
+               $status = $? >> 8;
+               push(@makeout,@indentout);
+       }

A lot of this special processing goes away when we're building with meson.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#58)
Re: scalability bottlenecks with (many) partitions (and more)

Andrew Dunstan <andrew@dunslane.net> writes:

On 2025-03-04 Tu 5:28 PM, Tom Lane wrote:

... I eventually concluded that there's
something wrong with the "scalar glob()" idiom you used.

Well, in scalar context it should give us back the first item found, or
undef if nothing is found, AIUI.

That's what I would have thought too, but it didn't seem to work that
way when I was testing the logic standalone: the script processed or
skipped directories according to no rule that I could figure out.

Anyway, for the moment I think we're all right with just the
directory path fix.

regards, tom lane