SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Started by Dilip Kumarover 2 years ago131 messages
#1Dilip Kumar
dilipbalaut@gmail.com
3 attachment(s)

The small size of the SLRU buffer pools can sometimes become a
performance problem because it’s not difficult to have a workload
where the number of buffers actively in use is larger than the
fixed-size buffer pool. However, just increasing the size of the
buffer pool doesn’t necessarily help, because the linear search that
we use for buffer replacement doesn’t scale, and also because
contention on the single centralized lock limits scalability.

There is a couple of patches proposed in the past to address the
problem of increasing the buffer pool size, one of the patch [1] was
proposed by Thomas Munro where we make the size of the buffer pool
configurable. And, in order to deal with the linear search in the
large buffer pool, we divide the SLRU buffer pool into associative
banks so that searching in the buffer pool doesn’t get affected by the
large size of the buffer pool. This does well for the workloads which
are mainly impacted by the frequent buffer replacement but this still
doesn’t stand well with the workloads where the centralized control
lock is the bottleneck.

So I have taken this patch as my base patch (v1-0001) and further
added 2 more improvements to this 1) In v1-0002, Instead of a
centralized control lock for the SLRU I have introduced a bank-wise
control lock 2)In v1-0003, I have removed the global LRU counter and
introduced a bank-wise counter. The second change (v1-0003) is in
order to avoid the CPU/OS cache invalidation due to frequent updates
of the single variable, later in my performance test I will show how
much gain we have gotten because of these 2 changes.

Note: This is going to be a long email but I have summarised the main
idea above this point and now I am going to discuss more internal
information in order to show that the design idea is valid and also
going to show 2 performance tests where one is specific to the
contention on the centralized lock and other is mainly contention due
to frequent buffer replacement in SLRU buffer pool. We are getting ~2x
TPS compared to the head by these patches and in later sections, I am
going discuss this in more detail i.e. exact performance numbers and
analysis of why we are seeing the gain.

There are some problems I faced while converting this centralized
control lock to a bank-wise lock and that is mainly because this lock
is (mis)used for different purposes. The main purpose of this control
lock as I understand it is to protect the in-memory access
(read/write) of the buffers in the SLRU buffer pool.

Here is the list of some problems and their analysis:

1) In some of the SLRU, we use this lock for protecting the members
inside the control structure which is specific to that SLRU layer i.e.
SerialControlData() members are protected by the SerialSLRULock, and I
don’t think it is the right use of this lock so for this purpose I
have introduced another lock called SerialControlLock for this
specific purpose. Based on my analysis there is no reason for
protecting these members and the SLRU buffer access with the same
lock.
2) The member called ‘latest_page_number’ inside SlruSharedData is
also protected by the SLRULock, I would not say this use case is wrong
but since this is a common variable and not a per bank variable can
not be protected by the bankwise lock now. But the usage of this
variable is just to track the latest page in an SLRU so that we do not
evict out the latest page during victim page selection. So I have
converted this to an atomic variable as it is completely independent
of the SLRU buffer access.
3) In order to protect SlruScanDirectory, basically the
SlruScanDirectory() from DeactivateCommitTs(), is called under the
SLRU control lock, but from all other places SlruScanDirectory() is
called without lock and that is because the caller of this function is
called from the places which are not executed concurrently(i.e.
startup, checkpoint). This function DeactivateCommitTs() is also
called from the startup only so there doesn't seem any use in calling
this under the SLRU control lock. Currently, I have called this under
the all-bank lock because logically this is not a performance path,
and that way we are keeping it consistent with the current logic, but
if others also think that we do not need a lock at this place then we
might remove it and then we don't need this all-bank lock anywhere.

There are some other uses of this lock where we might think it will be
a problem if we divide it into a bank-wise lock but it's not and I
have given my analysis for the same

1) SimpleLruTruncate: We might worry that if we convert to a bank-wise
lock then this could be an issue as we might need to release and
acquire different locks as we scan different banks. But as per my
analysis, this is not an issue because a) With the current code also
do release and acquire the centralized lock multiple times in order to
perform the I/O on the buffer slot so the behavior is not changed but
the most important thing is b) All SLRU layers take care that this
function should not be accessed concurrently, I have verified all
access to this function and its true and the function header of this
function also says the same. So this is not an issue as per my
analysis.

2) Extending or adding a new page to SLRU: I have noticed that this
is also protected by either some other exclusive lock or only done
during startup. So in short the SLRULock is just used for protecting
against the access of the buffers in the buffer pool but that is not
for guaranteeing the exclusive access inside the function because that
is taken care of in some other way.

3) Another thing that I noticed while writing this and thought it
would be good to make a note of that as well. Basically for the CLOG
group update of the xid status. Therein if we do not get the control
lock on the SLRU then we add ourselves to a group and then the group
leader does the job for all the members in the group. One might think
that different pages in the group might belong to different SLRU bank
so the leader might need to acquire/release the lock as it process the
request in the group. Yes, that is true, and it is taken care but we
don’t need to worry about the case because as per the implementation
of the group update, we are trying to have the members with the same
page request in one group and only due to some exception there could
be members with the different page request. So the point is with a
bank-wise lock we are handling that exception case but that's not a
regular case that we need to acquire/release multiple times. So
design-wise we are good and performance-wise there should not be any
problem because most of the time we might be updating the pages from
the same bank, and if in some cases we have some updates for old
transactions due to long-running transactions then we should do better
by not having a centralized lock.

Performance Test:
Exp1: Show problems due to CPU/OS cache invalidation due to frequent
updates of the centralized lock and a common LRU counter. So here we
are running a parallel transaction to pgbench script which frequently
creates subtransaction overflow and that forces the visibility-check
mechanism to access the subtrans SLRU.
Test machine: 8 CPU/ 64 core/ 128 with HT/ 512 MB RAM / SSD
scale factor: 300
shared_buffers=20GB
checkpoint_timeout=40min
max_wal_size=20GB
max_connections=200

Workload: Run these 2 scripts parallelly:
./pgbench -c $ -j $ -T 600 -P5 -M prepared postgres
./pgbench -c 1 -j 1 -T 600 -f savepoint.sql postgres

savepoint.sql (create subtransaction overflow)
BEGIN;
SAVEPOINT S1;
INSERT INTO test VALUES(1)
← repeat 70 times →
SELECT pg_sleep(1);
COMMIT;

Code under test:
Head: PostgreSQL head code
SlruBank: The first patch applied to convert the SLRU buffer pool into
the bank (0001)
SlruBank+BankwiseLockAndLru: Applied 0001+0002+0003

Results:
Clients Head SlruBank SlruBank+BankwiseLockAndLru
1 457 491 475
8 3753 3819 3782
32 14594 14328 17028
64 15600 16243 25944
128 15957 16272 31731

So we can see that at 128 clients, we get ~2x TPS(with SlruBank +
BankwiseLock and bankwise LRU counter) as compared to HEAD. We might
be thinking that we do not see much gain only with the SlruBank patch.
The reason is that in this particular test case, we are not seeing
much load on the buffer replacement. In fact, the wait event also
doesn’t show contention on any lock instead the main load is due to
frequently modifying the common variable like the centralized control
lock and the centralized LRU counters. That is evident in perf data
as shown below

+  74.72%   0.06% postgres postgres  [.] XidInMVCCSnapshot
+  74.08%   0.02% postgres postgres  [.] SubTransGetTopmostTransaction
+  74.04%   0.07% postgres postgres  [.] SubTransGetParent
+  57.66%   0.04% postgres postgres  [.] LWLockAcquire
+  57.64%   0.26% postgres postgres  [.] SimpleLruReadPage_ReadOnly
……
+  16.53%   0.07% postgres postgres  [.] LWLockRelease
+  16.36%   0.04% postgres postgres  [.] pg_atomic_sub_fetch_u32
+  16.31%  16.24% postgres postgres  [.] pg_atomic_fetch_sub_u32_impl

We can notice that the main load is on the atomic variable within the
LWLockAcquire and LWLockRelease. Once we apply the bankwise lock
patch(v1-0002) the same problem is visible on cur_lru_count updation
in the SlruRecentlyUsed[2]#define SlruRecentlyUsed(shared, slotno) \ do { \ .. (shared)->cur_lru_count = ++new_lru_count; \ .. } \ } while (0) macro (I have not shown that here but it
was visible in my perf report). And that is resolved by implementing
a bankwise counter.

[2]: #define SlruRecentlyUsed(shared, slotno) \ do { \ .. (shared)->cur_lru_count = ++new_lru_count; \ .. } \ } while (0)
#define SlruRecentlyUsed(shared, slotno) \
do { \
..
(shared)->cur_lru_count = ++new_lru_count; \
..
} \
} while (0)

Exp2: This test shows the load on SLRU frequent buffer replacement. In
this test, we are running the pgbench kind script which frequently
generates multixact-id, and parallelly we are starting and committing
a long-running transaction so that the multixact-ids are not
immediately cleaned up by the vacuum and we create contention on the
SLRU buffer pool. I am not leaving the long-running transaction
running forever as that will start to show another problem with
respect to bloat and we will lose the purpose of what I am trying to
show here.

Note: test configurations are the same as Exp1, just the workload is
different, we are running below 2 scripts.
and new config parameter(added in v1-0001) slru_buffers_size_scale=4,
that means NUM_MULTIXACTOFFSET_BUFFERS will be 64 that is 16 in Head
and
NUM_MULTIXACTMEMBER_BUFFERS will be 128 which is 32 in head

./pgbench -c $ -j $ -T 600 -P5 -M prepared -f multixact.sql postgres
./pgbench -c 1 -j 1 -T 600 -f longrunning.sql postgres

cat > multixact.sql <<EOF
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
SELECT FROM pgbench_accounts WHERE aid = :aid FOR UPDATE;
SAVEPOINT S1;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
EOF

cat > longrunning.sql << EOF
BEGIN;
INSERT INTO pgbench_test VALUES(1);
select pg_sleep(10);
COMMIT;
EOF

Results:
Clients Head SlruBank SlruBank+BankwiseLock
1 528 513 531
8 3870 4239 4157
32 13945 14470 14556
64 10086 19034 24482
128 6909 15627 18161

Here we can see good improvement with the SlruBank patch itself
because of increasing the SLRU buffer pool, as in this workload there
is a lot of contention due to buffer replacement. As shown below we
can see a lot of load on MultiXactOffsetSLRU as well as on
MultiXactOffsetBuffer which shows there are frequent buffer evictions
in this workload. And, increasing the SLRU buffer pool size is
helping a lot, and further dividing the SLRU lock into bank-wise locks
we are seeing a further gain. So in total, we are seeing ~2.5x TPS at
64 and 128 thread compared to head.

3401 LWLock | MultiXactOffsetSLRU
2031 LWLock | MultiXactOffsetBuffer
687 |
427 LWLock | BufferContent

Credits:
- The base patch v1-0001 is authored by Thomas Munro and I have just rebased it.
- 0002 and 0003 are new patches written by me based on design ideas
from Robert and Myself.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0003-Introduce-bank-wise-LRU-counter.patchapplication/octet-stream; name=v1-0003-Introduce-bank-wise-LRU-counter.patchDownload
From 2fe09c749e7fbca1998f7964ab8341df466023c3 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 11 Oct 2023 15:41:34 +0530
Subject: [PATCH v1 3/3] Introduce bank-wise LRU counter

Since we have already divided buffer pool in banks and victim
buffer search is also done at the bank level so there is no need
to have a centralized lru counter.  And this will also improve
the performance by reducing the frequent cpu cache invalidation by
not updating the common variable.

Dilip Kumar based on design idea from Robert Haas
---
 src/backend/access/transam/slru.c | 23 +++++++++++++++--------
 src/include/access/slru.h         | 28 +++++++++++++++++-----------
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index c06e4eddd1..fd44ad7d47 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -110,13 +110,13 @@ typedef struct SlruWriteAllData *SlruWriteAll;
  *
  * The reason for the if-test is that there are often many consecutive
  * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
  * pages' counts will "wrap around" and make them appear recently used.
  *
  * We allow this code to be executed concurrently by multiple processes within
  * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
  * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
+ * However, it is possible for either bank_cur_lru_count or individual
  * page_lru_count entries to be "reset" to lower values than they should have,
  * in case a process is delayed while it executes this macro.  With care in
  * SlruSelectLRUPage(), this does little harm, and in any case the absolute
@@ -125,9 +125,10 @@ typedef struct SlruWriteAllData *SlruWriteAll;
  */
 #define SlruRecentlyUsed(shared, slotno)	\
 	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
+		int		bankno = slotno / SLRU_BANK_SIZE; \
+		int		new_lru_count = (shared)->bank_cur_lru_count[bankno]; \
 		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
+			(shared)->bank_cur_lru_count[bankno] = ++new_lru_count; \
 			(shared)->page_lru_count[slotno] = new_lru_count; \
 		} \
 	} while (0)
@@ -200,6 +201,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
 	sz += MAXALIGN((bankmask + 1) * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN((bankmask + 1) * sizeof(int));   /* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -276,8 +278,6 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -300,6 +300,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
 		shared->bank_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -321,8 +323,11 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 		/* initialize bank locks for each buffer bank */
 		for (bankno = 0; bankno < nbanks; bankno++)
+		{
 			LWLockInitialize(&shared->bank_locks[bankno].lock,
 							 slru_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -1112,9 +1117,11 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
+
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
@@ -1149,7 +1156,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index eec7a568dc..fea12cdfb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -73,6 +73,23 @@ typedef struct SlruSharedData
 	 */
 	LWLockPadded *bank_locks;
 
+	/*----------
+	 * Instead of global counter we maintain a bank-wise lru counter because
+	 * a) we are doing the victim buffer selection as bank level so there is
+	 * no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page is therefore the one with the highest value of
+	 *		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int			 *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -84,17 +101,6 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
-- 
2.39.2 (Apple Git-143)

v1-0001-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v1-0001-Divide-SLRU-buffers-into-banks.patchDownload
From 0d05d2a043ab393df797ba2ab67d8471398a9260 Mon Sep 17 00:00:00 2001
From: Dilip kumar <dilipkumar@dkmac.local>
Date: Fri, 8 Sep 2023 15:08:32 +0530
Subject: [PATCH v1 1/3] Divide SLRU buffers into banks

We want to eliminate linear search within SLRU buffers.
To do so we divide SLRU buffers into banks. Each bank holds
approximately 8 buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks.

Also invent slru_buffers_size_scale to control SLRU buffers.

patch by Thomas Munro
---
 doc/src/sgml/config.sgml                      | 31 +++++++++++
 src/backend/access/transam/clog.c             | 28 ++--------
 src/backend/access/transam/commit_ts.c        | 19 ++-----
 src/backend/access/transam/slru.c             | 54 +++++++++++++++++--
 src/backend/access/transam/subtrans.c         |  1 +
 src/backend/utils/init/globals.c              |  2 +
 src/backend/utils/misc/guc_tables.c           | 10 ++++
 src/backend/utils/misc/postgresql.conf.sample |  3 ++
 src/include/access/multixact.h                |  4 +-
 src/include/access/slru.h                     |  5 ++
 src/include/access/subtrans.h                 |  2 +-
 src/include/commands/async.h                  |  2 +-
 src/include/miscadmin.h                       |  2 +
 src/include/storage/predicate.h               |  2 +-
 14 files changed, 117 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 924309af26..416d979b54 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,37 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-slru-buffers-size-scale" xreflabel="slru_buffers_size_scale">
+     <term><varname>slru_buffers_size_scale</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>slru_buffers_size_scale</varname> configuration parameter</primary>
+     </indexterm>
+     </term>
+     <listitem>
+      <para>
+       Specifies power 2 scale for all SLRU shared memory buffers sizes. Buffers sizes depends on
+       both <literal>guc_slru_buffers_size_scale</literal> and <literal>shared_buffers</literal> params.
+      </para>
+      <para>
+       This affects on buffers in the list below (see also <xref linkend="pgdata-contents-table"/>):
+        <itemizedlist>
+         <listitem><para><literal>NUM_MULTIXACTOFFSET_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_MULTIXACTMEMBER_BUFFERS = Min(64 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_SUBTRANS_BUFFERS = Min(64 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_NOTIFY_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_SERIAL_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_CLOG_BUFFERS = Min(128 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_COMMIT_TS_BUFFERS = Min(128 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+        </itemizedlist>
+      </para>
+      <para>
+       Value is in <literal>0..7</literal> bounds.
+       The default value is <literal>2</literal>.
+       This parameter can only be set at server start.
+      </para>
+     </listitem>
+    </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..29d58f1eb3 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -74,6 +74,8 @@
 #define GetLSNIndex(slotno, xid)	((slotno) * CLOG_LSNS_PER_PAGE + \
 	((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)
 
+#define NUM_CLOG_BUFFERS 	(128 << slru_buffers_size_scale)
+
 /*
  * The number of subtransactions below which we consider to apply clog group
  * update optimization.  Testing reveals that the number higher than this can
@@ -660,42 +662,20 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	return status;
 }
 
-/*
- * Number of shared CLOG buffers.
- *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
- */
-Size
-CLOGShmemBuffers(void)
-{
-	return Min(128, Max(4, NBuffers / 512));
-}
-
 /*
  * Initialization of shared memory for CLOG
  */
 Size
 CLOGShmemSize(void)
 {
-	return SimpleLruShmemSize(CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE);
+	return SimpleLruShmemSize(NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE);
 }
 
 void
 CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(XactCtl, "Xact", NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE,
 				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
 				  SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..54422f2780 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -70,6 +70,8 @@ typedef struct CommitTimestampEntry
 #define TransactionIdToCTsEntry(xid)	\
 	((xid) % (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
 
+#define NUM_COMMIT_TS_BUFFERS	(128 << slru_buffers_size_scale)
+
 /*
  * Link to shared-memory data structures for CommitTs control
  */
@@ -487,26 +489,13 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
 }
 
-/*
- * Number of shared CommitTS buffers.
- *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
- */
-Size
-CommitTsShmemBuffers(void)
-{
-	return Min(256, Max(4, NBuffers / 256));
-}
-
 /*
  * Shared memory sizing for CommitTs
  */
 Size
 CommitTsShmemSize(void)
 {
-	return SimpleLruShmemSize(CommitTsShmemBuffers(), 0) +
+	return SimpleLruShmemSize(NUM_COMMIT_TS_BUFFERS, 0) +
 		sizeof(CommitTimestampShared);
 }
 
@@ -520,7 +509,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "CommitTs", NUM_COMMIT_TS_BUFFERS, 0,
 				  CommitTsSLRULock, "pg_commit_ts",
 				  LWTRANCHE_COMMITTS_BUFFER,
 				  SYNC_HANDLER_COMMIT_TS);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 71ac70fb40..57889b72bd 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "port/pg_bitutils.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -71,6 +72,17 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
+
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE 8
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -134,7 +146,7 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
+static void SlruAdjustNSlots(int *nslots, int *bankmask);
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -148,6 +160,25 @@ static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
 
+/*
+ * Pick number of slots and bank size optimal for hashed associative SLRU buffers.
+ * We declare SLRU nslots is always power of 2.
+ * We split SLRU to 8-sized hash banks, after some performance benchmarks.
+ * We hash pageno to banks by pageno masked by 3 upper bits.
+ */
+static void
+SlruAdjustNSlots(int *nslots, int *bankmask)
+{
+	Assert(*nslots > 0);
+	Assert(*nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+
+	*nslots = (int) pg_nextpower2_32(Max(SLRU_BANK_SIZE, Min(*nslots, NBuffers / 256)));
+
+	*bankmask = *nslots / SLRU_BANK_SIZE - 1;
+
+	elog(DEBUG5, "nslots %d banksize %d nbanks %d bankmask %x", *nslots, SLRU_BANK_SIZE, *nslots / SLRU_BANK_SIZE, *bankmask);
+}
+
 /*
  * Initialization of shared memory
  */
@@ -156,6 +187,9 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			bankmask_ignore;
+
+	SlruAdjustNSlots(&nslots, &bankmask_ignore);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -191,6 +225,9 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 {
 	SlruShared	shared;
 	bool		found;
+	int			bankmask;
+
+	SlruAdjustNSlots(&nslots, &bankmask);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -258,7 +295,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +306,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = bankmask;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +538,14 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1031,7 +1074,10 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
+
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1112,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..125273e235 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 011ec18015..61b12d1056 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -154,3 +154,5 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			slru_buffers_size_scale = 2;	/* power 2 scale for SLRU buffers */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 16ec6c5ef0..4a182225b7 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2277,6 +2277,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"slru_buffers_size_scale", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("SLRU buffers size scale of power 2"),
+			NULL
+		},
+		&slru_buffers_size_scale,
+		2, 0, 7,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d08d55c3fe..136ea5f48c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -157,6 +157,9 @@
 					#   mmap
 					# (change requires restart)
 #min_dynamic_shared_memory = 0MB	# (change requires restart)
+#slru_buffers_size_scale = 2		# SLRU buffers size scale of power 2, range 0..7
+					# (change requires restart)
+
 #vacuum_buffer_usage_limit = 256kB	# size of vacuum and analyze buffer access strategy ring;
 					# 0 to disable vacuum buffer access strategy;
 					# range 128kB to 16GB
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 246f757f6a..6a2c914d48 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -30,8 +30,8 @@
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
 /* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
+#define NUM_MULTIXACTOFFSET_BUFFERS		(16 << slru_buffers_size_scale)
+#define NUM_MULTIXACTMEMBER_BUFFERS		(32 << slru_buffers_size_scale)
 
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index a8a424d92d..f5f2b5b8b5 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -134,6 +134,11 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * mask for slotno hash bank
+	 */
+	Size		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..0dad287550 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -12,7 +12,7 @@
 #define SUBTRANS_H
 
 /* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
+#define NUM_SUBTRANS_BUFFERS	(32 << slru_buffers_size_scale)
 
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b1d59472b1 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -18,7 +18,7 @@
 /*
  * The number of SLRU page buffers we use for the notification queue.
  */
-#define NUM_NOTIFY_BUFFERS	8
+#define NUM_NOTIFY_BUFFERS	(16 << slru_buffers_size_scale)
 
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14bd574fc2..f2cec02a2f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,7 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int slru_buffers_size_scale;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
@@ -262,6 +263,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
+extern PGDLLIMPORT int slru_buffers_size_scale;
 
 /*
  * Upper and lower hard limits for the buffer access strategy ring size
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..794ecd8169 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -28,7 +28,7 @@ extern PGDLLIMPORT int max_predicate_locks_per_page;
 
 
 /* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
+#define NUM_SERIAL_BUFFERS	(16 << slru_buffers_size_scale)
 
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
-- 
2.39.2 (Apple Git-143)

v1-0002-bank-wise-slru-locks.patchapplication/octet-stream; name=v1-0002-bank-wise-slru-locks.patchDownload
From 4823d95c86ee696b2df57526ffba93aea83054bf Mon Sep 17 00:00:00 2001
From: Dilip kumar <dilipkumar@dkmac.local>
Date: Sat, 9 Sep 2023 12:56:10 +0530
Subject: [PATCH v1 2/3] bank wise slru locks

The previous patch has divided SLRU buffer pool into associative
banks.  And this patch is further optimizing it by introducing
bank wise slru locks instead of a common centralized lock this
will reduce the contention on the slru control lock.

Dilip Kumar based on some design suggestions from Robert Haas
---
 src/backend/access/transam/clog.c        | 108 +++++++++-----
 src/backend/access/transam/commit_ts.c   |  43 +++---
 src/backend/access/transam/multixact.c   | 179 ++++++++++++++++-------
 src/backend/access/transam/slru.c        | 134 +++++++++++++----
 src/backend/access/transam/subtrans.c    |  27 ++--
 src/backend/commands/async.c             |  30 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  32 ++--
 src/include/access/slru.h                |  33 ++++-
 src/include/storage/lwlock.h             |   8 +
 src/test/modules/test_slru/test_slru.c   |  28 ++--
 12 files changed, 452 insertions(+), 198 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 29d58f1eb3..938806532d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -276,14 +276,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	*lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get SLRU lock w.r.t. the SLRU page we are going to access. */
+	lock = SimpleLruPageGetSLRULock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the SLRU lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -302,17 +307,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRULock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -325,10 +330,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -347,7 +352,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruPageGetSLRULock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -398,14 +404,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * SLRU lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention around
+ * SLRULock when many processes are trying to commit at once, since the lock
+ * need not be repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -419,6 +424,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -499,11 +506,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
+	 * We are leader so clear the list of processes waiting for
 	 * group XID status update, saving a pointer to the head of the list.
 	 * Trying to pop elements one at a time could lead to an ABA problem.
 	 */
@@ -513,10 +517,38 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/* Acquire the SLRU bank lock w.r.t. the first page in the group. */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruPageGetSLRULock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * Although we are trying our best to keep same page in a group, there
+		 * are cases where we might get different pages as well for detail
+		 * refer comment in above while loop where we are adding this process
+		 * for group update.  So if the current page we are going to access is
+		 * not in the same slru bank in which we updated the last page then we
+		 * need to release the lock on the previous bank and acquire lock on
+		 * the bank w.r.t. the page we are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	*lock = SimpleLruPageGetSLRULock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -536,7 +568,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -565,10 +598,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -657,7 +691,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruPageGetSLRULock(XactCtl, pageno));
 
 	return status;
 }
@@ -676,7 +710,7 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  "pg_xact", LWTRANCHE_XACT_BUFFER, LWTRANCHE_XACT_SLRU,
 				  SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
@@ -691,8 +725,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -701,7 +736,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -736,14 +771,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -754,8 +785,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruPageGetSLRULock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -787,7 +819,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -819,6 +851,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -829,13 +862,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruPageGetSLRULock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -973,16 +1007,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruPageGetSLRULock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 54422f2780..0c7f5bae86 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -220,8 +220,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -231,13 +232,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -338,7 +339,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruPageGetSLRULock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -510,9 +511,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", NUM_COMMIT_TS_BUFFERS, 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -668,9 +668,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -697,12 +695,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruPageGetSLRULock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -751,9 +750,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -785,6 +784,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -805,12 +805,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruPageGetSLRULock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -964,16 +966,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruPageGetSLRULock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -985,7 +989,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index abb022e067..e63bd4cf71 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -192,10 +192,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +870,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +894,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +917,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +951,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1255,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1360,23 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If the page is on the different SLRU bank then release the lock on the
+	 * previous bank if we are already holding one and acquire the lock on the
+	 * new bank.
+	 */
+	lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,22 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * SLRU pageno is changed so check whether this page is falling in
+			 * the different slru bank than on which we are already holding the
+			 * lock and if so release the lock on the old bank and acquire that
+			 * on the new bank.
+			 */
+			lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1433,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1443,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1463,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1500,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1852,15 +1911,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
-				  SYNC_HANDLER_MULTIXACT_MEMBER);
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU, SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
 	/* Initialize our shared state struct */
@@ -1894,8 +1952,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl , 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1904,9 +1964,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl , 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1915,7 +1976,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1975,10 +2036,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl ,pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1993,7 +2056,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2015,13 +2078,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2037,7 +2102,6 @@ TrimMultiXact(void)
 	int			pageno;
 	int			entryno;
 	int			flagsoff;
-
 	LWLockAcquire(MultiXactGenLock, LW_SHARED);
 	nextMXact = MultiXactState->nextMXact;
 	offset = MultiXactState->nextOffset;
@@ -2046,13 +2110,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2067,7 +2131,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock   *lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2075,18 +2141,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2098,7 +2163,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2113,10 +2180,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2404,6 +2470,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2414,13 +2481,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2453,15 +2521,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2759,7 +2829,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3241,31 +3311,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruPageGetSLRULock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruPageGetSLRULock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3331,7 +3403,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 57889b72bd..c06e4eddd1 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -187,9 +187,9 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
-	int			bankmask_ignore;
+	int			bankmask;
 
-	SlruAdjustNSlots(&nslots, &bankmask_ignore);
+	SlruAdjustNSlots(&nslots, &bankmask);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -199,6 +199,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN((bankmask + 1) * sizeof(LWLockPadded));	/* bank_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -206,6 +207,32 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+
+	for (bankno = 0; bankno <= ctl->bank_mask; bankno++)
+		LWLockAcquire(&shared->bank_locks[bankno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+
+	for (bankno = 0; bankno <= ctl->bank_mask; bankno++)
+		LWLockRelease(&shared->bank_locks[bankno].lock);
+}
+
 /*
  * Initialize, or attach to, a simple LRU cache in shared memory.
  *
@@ -220,7 +247,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int tranche_id, int slru_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
@@ -239,13 +266,13 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			nbanks = bankmask + 1;
+		int			bankno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
@@ -271,6 +298,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
 
 		if (nlsns > 0)
 		{
@@ -290,6 +319,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared->page_lru_count[slotno] = 0;
 			ptr += BLCKSZ;
 		}
+		/* initialize bank locks for each buffer bank */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			LWLockInitialize(&shared->bank_locks[bankno].lock,
+							 slru_tranche_id);
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -344,7 +377,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -383,12 +416,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = slotno / SLRU_BANK_SIZE;
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -443,6 +477,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -485,9 +520,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = slotno / SLRU_BANK_SIZE;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -496,7 +532,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -538,11 +574,12 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
-	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (slotno = bankstart; slotno < bankend; slotno++)
@@ -562,8 +599,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -585,6 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			bankno = slotno / SLRU_BANK_SIZE;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -613,7 +651,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -628,7 +666,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1133,7 +1171,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1207,6 +1245,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			lastbankno = 0;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1217,10 +1256,19 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[0].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / SLRU_BANK_SIZE;
+
+		if (curbankno != lastbankno)
+		{
+			LWLockRelease(&shared->bank_locks[lastbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			lastbankno = curbankno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1234,7 +1282,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[lastbankno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1274,6 +1322,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevbankno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1284,25 +1333,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbankno = 0;
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / SLRU_BANK_SIZE;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1332,10 +1394,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbankno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1376,15 +1440,31 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevbankno = 0;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbankno;
+
+		curbankno = slotno / SLRU_BANK_SIZE;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1418,7 +1498,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 }
 
 /*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 125273e235..2b0afa8a15 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -77,12 +77,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruPageGetSLRULock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -100,7 +102,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -130,7 +132,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruPageGetSLRULock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -193,8 +195,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -212,8 +214,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -222,7 +225,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -259,7 +262,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(SubTransCtl, LW_EXCLUSIVE);
 
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
@@ -275,7 +278,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	}
 	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	SimpleLruReleaseAllBankLock(SubTransCtl);
 }
 
 /*
@@ -309,6 +312,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -320,12 +324,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruPageGetSLRULock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index d148d10850..7088fe15ea 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -570,8 +571,8 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
-				  SYNC_HANDLER_NONE);
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  LWTRANCHE_NOTIFY_SLRU, SYNC_HANDLER_NONE);
 
 	if (!found)
 	{
@@ -1402,7 +1403,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRULock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1412,9 +1413,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1438,6 +1437,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	lock = SimpleLruPageGetSLRULock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1509,7 +1513,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(lock);
 
 	return nextNotify;
 }
@@ -1988,7 +1992,7 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
+			 * holding the SLRULock while we are examining the entries
 			 * and possibly transmitting them to our frontend.  Copy only the
 			 * part of the page we will actually inspect.
 			 */
@@ -2010,7 +2014,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruPageGetSLRULock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2051,7 +2055,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2204,7 +2208,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..fe00148956 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,8 +808,8 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  NUM_SERIAL_BUFFERS, 0, "pg_serial", LWTRANCHE_SERIAL_BUFFER,
+				  LWTRANCHE_SERIAL_SLRU, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -846,12 +846,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruPageGetSLRULock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -901,7 +903,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -919,10 +921,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -934,13 +936,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruPageGetSLRULock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -953,7 +955,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -965,7 +967,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -983,7 +985,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -992,7 +994,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1006,12 +1008,12 @@ CheckPointPredicate(void)
 {
 	int			tailPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1055,7 +1057,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, tailPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index f5f2b5b8b5..eec7a568dc 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -52,8 +52,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -68,6 +66,13 @@ typedef struct SlruSharedData
 	int		   *page_lru_count;
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Lock to protect the buffer slot access in per SLRU bank.  The
+	 * buffer_locks protects the I/O on each buffer slots whereas this lock
+	 * protect the in memory operation on the buffer within one SLRU bank.
+	 */
+	LWLockPadded *bank_locks;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -95,7 +100,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32	latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -143,11 +148,25 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruPageGetSLRULock(SlruCtl ctl, int pageno)
+{
+	int			bankno = (pageno & ctl->bank_mask);
+
+	/* Try to find the page while holding only shared lock */
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int tranche_id,
+						  int slru_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -175,5 +194,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern LWLock *SimpleLruPageGetSLRULock(SlruCtl ctl, int pageno);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d77410bdea..09d2efe8ca 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,14 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..7b2eb4ae50 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -63,9 +63,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +80,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +99,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +117,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +135,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruPageGetSLRULock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +218,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +232,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#1)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Oct 11, 2023 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

The small size of the SLRU buffer pools can sometimes become a
performance problem because it’s not difficult to have a workload
where the number of buffers actively in use is larger than the
fixed-size buffer pool. However, just increasing the size of the
buffer pool doesn’t necessarily help, because the linear search that
we use for buffer replacement doesn’t scale, and also because
contention on the single centralized lock limits scalability.

There is a couple of patches proposed in the past to address the
problem of increasing the buffer pool size, one of the patch [1] was
proposed by Thomas Munro where we make the size of the buffer pool
configurable.

In my last email, I forgot to give the link from where I have taken
the base path for dividing the buffer pool in banks so giving the same
here[1]https://commitfest.postgresql.org/43/2627/. And looking at this again it seems that the idea of that
patch was from
Andrey M. Borodin and the idea of the SLRU scale factor were
introduced by Yura Sokolov and Ivan Lazarev. Apologies for missing
that in the first email.

[1]: https://commitfest.postgresql.org/43/2627/

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#2)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Oct 11, 2023 at 5:57 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Oct 11, 2023 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

In my last email, I forgot to give the link from where I have taken
the base path for dividing the buffer pool in banks so giving the same
here[1]. And looking at this again it seems that the idea of that
patch was from
Andrey M. Borodin and the idea of the SLRU scale factor were
introduced by Yura Sokolov and Ivan Lazarev. Apologies for missing
that in the first email.

[1] https://commitfest.postgresql.org/43/2627/

In my last email I have just rebased the base patch, so now while
reading through that patch I realized that there was some refactoring
needed and some unused functions were there so I have removed that and
also added some comments. Also did some refactoring to my patches. So
reposting the patch series.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0002-bank-wise-slru-locks.patchapplication/octet-stream; name=v2-0002-bank-wise-slru-locks.patchDownload
From 72f6610cbdcdcfdd3a0efe3e83031852c56e0bd9 Mon Sep 17 00:00:00 2001
From: Dilip kumar <dilipkumar@dkmac.local>
Date: Sat, 9 Sep 2023 12:56:10 +0530
Subject: [PATCH v2 2/3] bank wise slru locks

The previous patch has divided SLRU buffer pool into associative
banks.  And this patch is further optimizing it by introducing
bank wise slru locks instead of a common centralized lock this
will reduce the contention on the slru control lock.

Dilip Kumar based on some design suggestions from Robert Haas
---
 src/backend/access/transam/clog.c        | 108 +++++++++-----
 src/backend/access/transam/commit_ts.c   |  43 +++---
 src/backend/access/transam/multixact.c   | 179 ++++++++++++++++-------
 src/backend/access/transam/slru.c        | 139 ++++++++++++++----
 src/backend/access/transam/subtrans.c    |  57 ++++++--
 src/backend/commands/async.c             |  30 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  32 ++--
 src/include/access/slru.h                |  32 +++-
 src/include/storage/lwlock.h             |   8 +
 src/test/modules/test_slru/test_slru.c   |  32 ++--
 12 files changed, 482 insertions(+), 206 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index d4ac85e052..929d89a187 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -277,14 +277,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	*lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock w.r.t. the page we are going to access. */
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the SLRU lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -303,17 +308,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -326,10 +331,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -348,7 +353,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetSLRUBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -399,14 +405,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -420,6 +425,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -500,11 +507,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
+	 * We are leader so clear the list of processes waiting for
 	 * group XID status update, saving a pointer to the head of the list.
 	 * Trying to pop elements one at a time could lead to an ABA problem.
 	 */
@@ -514,10 +518,38 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/* Acquire the SLRU bank lock w.r.t. the first page in the group. */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetSLRUBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * Although we are trying our best to keep same page in a group, there
+		 * are cases where we might get different pages as well for detail
+		 * refer comment in above while loop where we are adding this process
+		 * for group update.  So if the current page we are going to access is
+		 * not in the same slru bank in which we updated the last page then we
+		 * need to release the lock on the previous bank and acquire lock on
+		 * the bank w.r.t. the page we are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	*lock = SimpleLruGetSLRUBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -537,7 +569,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -566,10 +599,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -658,7 +692,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -677,7 +711,7 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  "pg_xact", LWTRANCHE_XACT_BUFFER, LWTRANCHE_XACT_SLRU,
 				  SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
@@ -692,8 +726,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -702,7 +737,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -737,14 +772,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -755,8 +786,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -788,7 +820,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -820,6 +852,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -830,13 +863,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -974,16 +1008,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 26614d5ceb..645a11d1ab 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -221,8 +221,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -232,13 +233,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -339,7 +340,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -511,9 +512,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", NUM_COMMIT_TS_BUFFERS, 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -669,9 +669,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -698,12 +696,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -752,9 +751,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -786,6 +785,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -806,12 +806,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -965,16 +967,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -986,7 +990,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index abb022e067..804e3c603c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -192,10 +192,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +870,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +894,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +917,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +951,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1255,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1360,23 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If the page is on the different SLRU bank then release the lock on the
+	 * previous bank if we are already holding one and acquire the lock on the
+	 * new bank.
+	 */
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,22 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * SLRU pageno is changed so check whether this page is falling in
+			 * the different slru bank than on which we are already holding the
+			 * lock and if so release the lock on the old bank and acquire that
+			 * on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1433,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1443,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1463,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1500,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1852,15 +1911,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
-				  SYNC_HANDLER_MULTIXACT_MEMBER);
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU, SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
 	/* Initialize our shared state struct */
@@ -1894,8 +1952,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl , 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1904,9 +1964,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl , 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1915,7 +1976,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1975,10 +2036,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl ,pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1993,7 +2056,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2015,13 +2078,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2037,7 +2102,6 @@ TrimMultiXact(void)
 	int			pageno;
 	int			entryno;
 	int			flagsoff;
-
 	LWLockAcquire(MultiXactGenLock, LW_SHARED);
 	nextMXact = MultiXactState->nextMXact;
 	offset = MultiXactState->nextOffset;
@@ -2046,13 +2110,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2067,7 +2131,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock   *lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2075,18 +2141,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2098,7 +2163,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2113,10 +2180,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2404,6 +2470,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2414,13 +2481,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2453,15 +2521,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2759,7 +2829,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3241,31 +3311,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3331,7 +3403,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 57889b72bd..d0931308f8 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -187,9 +187,9 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
-	int			bankmask_ignore;
+	int			bankmask;
 
-	SlruAdjustNSlots(&nslots, &bankmask_ignore);
+	SlruAdjustNSlots(&nslots, &bankmask);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -199,6 +199,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN((bankmask + 1) * sizeof(LWLockPadded));	/* bank_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -206,6 +207,32 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+
+	for (bankno = 0; bankno <= ctl->bank_mask; bankno++)
+		LWLockAcquire(&shared->bank_locks[bankno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+
+	for (bankno = 0; bankno <= ctl->bank_mask; bankno++)
+		LWLockRelease(&shared->bank_locks[bankno].lock);
+}
+
 /*
  * Initialize, or attach to, a simple LRU cache in shared memory.
  *
@@ -215,12 +242,13 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the SLRU's per-bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
@@ -239,13 +267,13 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			nbanks = bankmask + 1;
+		int			bankno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
@@ -271,6 +299,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
 
 		if (nlsns > 0)
 		{
@@ -282,7 +312,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -290,6 +320,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared->page_lru_count[slotno] = 0;
 			ptr += BLCKSZ;
 		}
+		/* Initialize bank locks for each buffer bank. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			LWLockInitialize(&shared->bank_locks[bankno].lock,
+							 bank_tranche_id);
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -344,7 +378,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -383,12 +417,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = slotno / SLRU_BANK_SIZE;
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -443,6 +478,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -485,9 +521,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = slotno / SLRU_BANK_SIZE;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -496,7 +533,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -538,11 +575,12 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
-	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (slotno = bankstart; slotno < bankend; slotno++)
@@ -562,8 +600,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -585,6 +623,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			bankno = slotno / SLRU_BANK_SIZE;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -613,7 +652,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -628,7 +667,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1133,7 +1172,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1207,6 +1246,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			lastbankno = 0;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1217,10 +1257,19 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[0].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / SLRU_BANK_SIZE;
+
+		if (curbankno != lastbankno)
+		{
+			LWLockRelease(&shared->bank_locks[lastbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			lastbankno = curbankno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1234,7 +1283,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[lastbankno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1274,6 +1323,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevbankno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1284,25 +1334,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbankno = 0;
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / SLRU_BANK_SIZE;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1332,10 +1395,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbankno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1376,15 +1441,31 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevbankno = 0;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbankno;
+
+		curbankno = slotno / SLRU_BANK_SIZE;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1418,7 +1499,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 }
 
 /*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 125273e235..48f22a5fcd 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -77,12 +77,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -100,7 +102,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -130,7 +132,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -193,8 +195,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -212,8 +214,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -222,7 +225,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -252,6 +255,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -259,23 +264,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release
+	 * the lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -309,6 +338,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -320,12 +350,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index d148d10850..2fc230ca51 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -570,8 +571,8 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
-				  SYNC_HANDLER_NONE);
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  LWTRANCHE_NOTIFY_SLRU, SYNC_HANDLER_NONE);
 
 	if (!found)
 	{
@@ -1402,7 +1403,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1412,9 +1413,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1438,6 +1437,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	lock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1509,7 +1513,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(lock);
 
 	return nextNotify;
 }
@@ -1988,7 +1992,7 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
+			 * holding the SLRU lock while we are examining the entries
 			 * and possibly transmitting them to our frontend.  Copy only the
 			 * part of the page we will actually inspect.
 			 */
@@ -2010,7 +2014,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetSLRUBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2051,7 +2055,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2204,7 +2208,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..e771aaa82b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,8 +808,8 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  NUM_SERIAL_BUFFERS, 0, "pg_serial", LWTRANCHE_SERIAL_BUFFER,
+				  LWTRANCHE_SERIAL_SLRU, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -846,12 +846,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetSLRUBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -901,7 +903,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -919,10 +921,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -934,13 +936,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -953,7 +955,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -965,7 +967,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -983,7 +985,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -992,7 +994,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1006,12 +1008,12 @@ CheckPointPredicate(void)
 {
 	int			tailPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1055,7 +1057,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, tailPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index f5f2b5b8b5..8844853a57 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -52,8 +52,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -68,6 +66,13 @@ typedef struct SlruSharedData
 	int		   *page_lru_count;
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Locks to protect the in memory buffer slot access in per SLRU bank.
+	 * The buffer_locks protects the I/O on each buffer slots whereas this lock
+	 * protect the in memory operation on the buffer within one SLRU bank.
+	 */
+	LWLockPadded *bank_locks;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -95,7 +100,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32	latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -143,11 +148,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno)
+{
+	int			bankno = (pageno & ctl->bank_mask);
+
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -175,5 +193,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern LWLock *SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d77410bdea..09d2efe8ca 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,14 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+
 	LWTRANCHE_FIRST_USER_DEFINED
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9a02f33933 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

v2-0003-Introduce-bank-wise-LRU-counter.patchapplication/octet-stream; name=v2-0003-Introduce-bank-wise-LRU-counter.patchDownload
From 9c8528913575edd9dd8a095e9cd7dd648fed0c5f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 12 Oct 2023 16:04:14 +0530
Subject: [PATCH v2 3/3] Introduce bank-wise LRU counter

Since we have already divided buffer pool in banks and victim
buffer search is also done at the bank level so there is no need
to have a centralized lru counter.  And this will also improve
the performance by reducing the frequent cpu cache invalidation by
not updating the common variable.

Dilip Kumar based on design idea from Robert Haas
---
 src/backend/access/transam/slru.c | 23 +++++++++++++++--------
 src/include/access/slru.h         | 28 +++++++++++++++++-----------
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index d0931308f8..318d9ea3fa 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -110,13 +110,13 @@ typedef struct SlruWriteAllData *SlruWriteAll;
  *
  * The reason for the if-test is that there are often many consecutive
  * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
  * pages' counts will "wrap around" and make them appear recently used.
  *
  * We allow this code to be executed concurrently by multiple processes within
  * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
  * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
+ * However, it is possible for either bank_cur_lru_count or individual
  * page_lru_count entries to be "reset" to lower values than they should have,
  * in case a process is delayed while it executes this macro.  With care in
  * SlruSelectLRUPage(), this does little harm, and in any case the absolute
@@ -125,9 +125,10 @@ typedef struct SlruWriteAllData *SlruWriteAll;
  */
 #define SlruRecentlyUsed(shared, slotno)	\
 	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
+		int		slrubankno = (slotno) / SLRU_BANK_SIZE; \
+		int		new_lru_count = (shared)->bank_cur_lru_count[slrubankno]; \
 		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
+			(shared)->bank_cur_lru_count[slrubankno] = ++new_lru_count; \
 			(shared)->page_lru_count[slotno] = new_lru_count; \
 		} \
 	} while (0)
@@ -200,6 +201,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
 	sz += MAXALIGN((bankmask + 1) * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN((bankmask + 1) * sizeof(int));   /* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -277,8 +279,6 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -301,6 +301,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
 		shared->bank_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -322,8 +324,11 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 		/* Initialize bank locks for each buffer bank. */
 		for (bankno = 0; bankno < nbanks; bankno++)
+		{
 			LWLockInitialize(&shared->bank_locks[bankno].lock,
 							 bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -1113,9 +1118,11 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
+
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
@@ -1150,7 +1157,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 8844853a57..9be6d26d78 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -73,6 +73,23 @@ typedef struct SlruSharedData
 	 */
 	LWLockPadded *bank_locks;
 
+	/*----------
+	 * Instead of global counter we maintain a bank-wise lru counter because
+	 * a) we are doing the victim buffer selection as bank level so there is
+	 * no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page is therefore the one with the highest value of
+	 *		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int			 *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -84,17 +101,6 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
-- 
2.39.2 (Apple Git-143)

v2-0001-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v2-0001-Divide-SLRU-buffers-into-banks.patchDownload
From 5fa38ace34f0c460c9af8889ea922c2d5c4d0b38 Mon Sep 17 00:00:00 2001
From: Dilip kumar <dilipkumar@dkmac.local>
Date: Fri, 8 Sep 2023 15:08:32 +0530
Subject: [PATCH v2 1/3] Divide SLRU buffers into banks

We want to eliminate linear search within SLRU buffers.
To do so we divide SLRU buffers into banks. Each bank holds
approximately 8 buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks.

Also invent slru_buffers_size_scale to control SLRU buffers.

Andrey M. Borodin, Yura Sokolov, Ivan Lazarev and minor refactoring by Dilip Kumar
---
 doc/src/sgml/config.sgml                      | 31 +++++++++++
 src/backend/access/transam/clog.c             | 29 ++--------
 src/backend/access/transam/commit_ts.c        | 20 ++-----
 src/backend/access/transam/slru.c             | 54 +++++++++++++++++--
 src/backend/access/transam/subtrans.c         |  1 +
 src/backend/utils/init/globals.c              |  2 +
 src/backend/utils/misc/guc_tables.c           | 10 ++++
 src/backend/utils/misc/postgresql.conf.sample |  3 ++
 src/include/access/clog.h                     |  1 -
 src/include/access/commit_ts.h                |  1 -
 src/include/access/multixact.h                |  4 +-
 src/include/access/slru.h                     |  5 ++
 src/include/access/subtrans.h                 |  2 +-
 src/include/commands/async.h                  |  2 +-
 src/include/miscadmin.h                       |  2 +
 src/include/storage/predicate.h               |  2 +-
 16 files changed, 119 insertions(+), 50 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 924309af26..416d979b54 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,37 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-slru-buffers-size-scale" xreflabel="slru_buffers_size_scale">
+     <term><varname>slru_buffers_size_scale</varname> (<type>integer</type>)
+     <indexterm>
+      <primary><varname>slru_buffers_size_scale</varname> configuration parameter</primary>
+     </indexterm>
+     </term>
+     <listitem>
+      <para>
+       Specifies power 2 scale for all SLRU shared memory buffers sizes. Buffers sizes depends on
+       both <literal>guc_slru_buffers_size_scale</literal> and <literal>shared_buffers</literal> params.
+      </para>
+      <para>
+       This affects on buffers in the list below (see also <xref linkend="pgdata-contents-table"/>):
+        <itemizedlist>
+         <listitem><para><literal>NUM_MULTIXACTOFFSET_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_MULTIXACTMEMBER_BUFFERS = Min(64 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_SUBTRANS_BUFFERS = Min(64 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_NOTIFY_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_SERIAL_BUFFERS = Min(32 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_CLOG_BUFFERS = Min(128 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+         <listitem><para><literal>NUM_COMMIT_TS_BUFFERS = Min(128 &lt;&lt; slru_buffers_size_scale, shared_buffers/256)</literal></para></listitem>
+        </itemizedlist>
+      </para>
+      <para>
+       Value is in <literal>0..7</literal> bounds.
+       The default value is <literal>2</literal>.
+       This parameter can only be set at server start.
+      </para>
+     </listitem>
+    </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..d4ac85e052 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -74,6 +74,9 @@
 #define GetLSNIndex(slotno, xid)	((slotno) * CLOG_LSNS_PER_PAGE + \
 	((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)
 
+/* Number of SLRU buffers to use for clog */
+#define NUM_CLOG_BUFFERS 	(128 << slru_buffers_size_scale)
+
 /*
  * The number of subtransactions below which we consider to apply clog group
  * update optimization.  Testing reveals that the number higher than this can
@@ -660,42 +663,20 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	return status;
 }
 
-/*
- * Number of shared CLOG buffers.
- *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
- */
-Size
-CLOGShmemBuffers(void)
-{
-	return Min(128, Max(4, NBuffers / 512));
-}
-
 /*
  * Initialization of shared memory for CLOG
  */
 Size
 CLOGShmemSize(void)
 {
-	return SimpleLruShmemSize(CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE);
+	return SimpleLruShmemSize(NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE);
 }
 
 void
 CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(XactCtl, "Xact", NUM_CLOG_BUFFERS, CLOG_LSNS_PER_PAGE,
 				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
 				  SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..26614d5ceb 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -70,6 +70,9 @@ typedef struct CommitTimestampEntry
 #define TransactionIdToCTsEntry(xid)	\
 	((xid) % (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
 
+/* Number of SLRU buffers to use for commit_ts */
+#define NUM_COMMIT_TS_BUFFERS	(128 << slru_buffers_size_scale)
+
 /*
  * Link to shared-memory data structures for CommitTs control
  */
@@ -487,26 +490,13 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
 }
 
-/*
- * Number of shared CommitTS buffers.
- *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
- */
-Size
-CommitTsShmemBuffers(void)
-{
-	return Min(256, Max(4, NBuffers / 256));
-}
-
 /*
  * Shared memory sizing for CommitTs
  */
 Size
 CommitTsShmemSize(void)
 {
-	return SimpleLruShmemSize(CommitTsShmemBuffers(), 0) +
+	return SimpleLruShmemSize(NUM_COMMIT_TS_BUFFERS, 0) +
 		sizeof(CommitTimestampShared);
 }
 
@@ -520,7 +510,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "CommitTs", NUM_COMMIT_TS_BUFFERS, 0,
 				  CommitTsSLRULock, "pg_commit_ts",
 				  LWTRANCHE_COMMITTS_BUFFER,
 				  SYNC_HANDLER_COMMIT_TS);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 71ac70fb40..57889b72bd 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "port/pg_bitutils.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -71,6 +72,17 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
+
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE 8
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -134,7 +146,7 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
+static void SlruAdjustNSlots(int *nslots, int *bankmask);
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -148,6 +160,25 @@ static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
 
+/*
+ * Pick number of slots and bank size optimal for hashed associative SLRU buffers.
+ * We declare SLRU nslots is always power of 2.
+ * We split SLRU to 8-sized hash banks, after some performance benchmarks.
+ * We hash pageno to banks by pageno masked by 3 upper bits.
+ */
+static void
+SlruAdjustNSlots(int *nslots, int *bankmask)
+{
+	Assert(*nslots > 0);
+	Assert(*nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+
+	*nslots = (int) pg_nextpower2_32(Max(SLRU_BANK_SIZE, Min(*nslots, NBuffers / 256)));
+
+	*bankmask = *nslots / SLRU_BANK_SIZE - 1;
+
+	elog(DEBUG5, "nslots %d banksize %d nbanks %d bankmask %x", *nslots, SLRU_BANK_SIZE, *nslots / SLRU_BANK_SIZE, *bankmask);
+}
+
 /*
  * Initialization of shared memory
  */
@@ -156,6 +187,9 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			bankmask_ignore;
+
+	SlruAdjustNSlots(&nslots, &bankmask_ignore);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -191,6 +225,9 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 {
 	SlruShared	shared;
 	bool		found;
+	int			bankmask;
+
+	SlruAdjustNSlots(&nslots, &bankmask);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -258,7 +295,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +306,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = bankmask;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +538,14 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1031,7 +1074,10 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
+
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1112,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..125273e235 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 011ec18015..61b12d1056 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -154,3 +154,5 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			slru_buffers_size_scale = 2;	/* power 2 scale for SLRU buffers */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 16ec6c5ef0..4a182225b7 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2277,6 +2277,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"slru_buffers_size_scale", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("SLRU buffers size scale of power 2"),
+			NULL
+		},
+		&slru_buffers_size_scale,
+		2, 0, 7,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d08d55c3fe..136ea5f48c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -157,6 +157,9 @@
 					#   mmap
 					# (change requires restart)
 #min_dynamic_shared_memory = 0MB	# (change requires restart)
+#slru_buffers_size_scale = 2		# SLRU buffers size scale of power 2, range 0..7
+					# (change requires restart)
+
 #vacuum_buffer_usage_limit = 256kB	# size of vacuum and analyze buffer access strategy ring;
 					# 0 to disable vacuum buffer access strategy;
 					# range 128kB to 16GB
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index d99444f073..cee7e19b3f 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..155e82eb4f 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 246f757f6a..6a2c914d48 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -30,8 +30,8 @@
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
 /* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
+#define NUM_MULTIXACTOFFSET_BUFFERS		(16 << slru_buffers_size_scale)
+#define NUM_MULTIXACTMEMBER_BUFFERS		(32 << slru_buffers_size_scale)
 
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index a8a424d92d..f5f2b5b8b5 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -134,6 +134,11 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * mask for slotno hash bank
+	 */
+	Size		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..0dad287550 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -12,7 +12,7 @@
 #define SUBTRANS_H
 
 /* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
+#define NUM_SUBTRANS_BUFFERS	(32 << slru_buffers_size_scale)
 
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b1d59472b1 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -18,7 +18,7 @@
 /*
  * The number of SLRU page buffers we use for the notification queue.
  */
-#define NUM_NOTIFY_BUFFERS	8
+#define NUM_NOTIFY_BUFFERS	(16 << slru_buffers_size_scale)
 
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14bd574fc2..f2cec02a2f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,7 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int slru_buffers_size_scale;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
@@ -262,6 +263,7 @@ extern PGDLLIMPORT int work_mem;
 extern PGDLLIMPORT double hash_mem_multiplier;
 extern PGDLLIMPORT int maintenance_work_mem;
 extern PGDLLIMPORT int max_parallel_maintenance_workers;
+extern PGDLLIMPORT int slru_buffers_size_scale;
 
 /*
  * Upper and lower hard limits for the buffer access strategy ring size
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..794ecd8169 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -28,7 +28,7 @@ extern PGDLLIMPORT int max_predicate_locks_per_page;
 
 
 /* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
+#define NUM_SERIAL_BUFFERS	(16 << slru_buffers_size_scale)
 
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
-- 
2.39.2 (Apple Git-143)

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#1)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Oct 11, 2023 at 4:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

The small size of the SLRU buffer pools can sometimes become a
performance problem because it’s not difficult to have a workload
where the number of buffers actively in use is larger than the
fixed-size buffer pool. However, just increasing the size of the
buffer pool doesn’t necessarily help, because the linear search that
we use for buffer replacement doesn’t scale, and also because
contention on the single centralized lock limits scalability.

There is a couple of patches proposed in the past to address the
problem of increasing the buffer pool size, one of the patch [1] was
proposed by Thomas Munro where we make the size of the buffer pool
configurable. And, in order to deal with the linear search in the
large buffer pool, we divide the SLRU buffer pool into associative
banks so that searching in the buffer pool doesn’t get affected by the
large size of the buffer pool. This does well for the workloads which
are mainly impacted by the frequent buffer replacement but this still
doesn’t stand well with the workloads where the centralized control
lock is the bottleneck.

So I have taken this patch as my base patch (v1-0001) and further
added 2 more improvements to this 1) In v1-0002, Instead of a
centralized control lock for the SLRU I have introduced a bank-wise
control lock 2)In v1-0003, I have removed the global LRU counter and
introduced a bank-wise counter. The second change (v1-0003) is in
order to avoid the CPU/OS cache invalidation due to frequent updates
of the single variable, later in my performance test I will show how
much gain we have gotten because of these 2 changes.

Note: This is going to be a long email but I have summarised the main
idea above this point and now I am going to discuss more internal
information in order to show that the design idea is valid and also
going to show 2 performance tests where one is specific to the
contention on the centralized lock and other is mainly contention due
to frequent buffer replacement in SLRU buffer pool. We are getting ~2x
TPS compared to the head by these patches and in later sections, I am
going discuss this in more detail i.e. exact performance numbers and
analysis of why we are seeing the gain.

...

Performance Test:
Exp1: Show problems due to CPU/OS cache invalidation due to frequent
updates of the centralized lock and a common LRU counter. So here we
are running a parallel transaction to pgbench script which frequently
creates subtransaction overflow and that forces the visibility-check
mechanism to access the subtrans SLRU.
Test machine: 8 CPU/ 64 core/ 128 with HT/ 512 MB RAM / SSD
scale factor: 300
shared_buffers=20GB
checkpoint_timeout=40min
max_wal_size=20GB
max_connections=200

Workload: Run these 2 scripts parallelly:
./pgbench -c $ -j $ -T 600 -P5 -M prepared postgres
./pgbench -c 1 -j 1 -T 600 -f savepoint.sql postgres

savepoint.sql (create subtransaction overflow)
BEGIN;
SAVEPOINT S1;
INSERT INTO test VALUES(1)
← repeat 70 times →
SELECT pg_sleep(1);
COMMIT;

Code under test:
Head: PostgreSQL head code
SlruBank: The first patch applied to convert the SLRU buffer pool into
the bank (0001)
SlruBank+BankwiseLockAndLru: Applied 0001+0002+0003

Results:
Clients Head SlruBank SlruBank+BankwiseLockAndLru
1 457 491 475
8 3753 3819 3782
32 14594 14328 17028
64 15600 16243 25944
128 15957 16272 31731

So we can see that at 128 clients, we get ~2x TPS(with SlruBank +
BankwiseLock and bankwise LRU counter) as compared to HEAD.

This and other results shared by you look promising. Will there be any
improvement in workloads related to clog buffer usage? BTW, I remember
that there was also a discussion of moving SLRU into a regular buffer
pool [1]https://commitfest.postgresql.org/43/3514/. You have not provided any explanation as to whether that
approach will have any merits after we do this or whether that
approach is not worth pursuing at all.

[1]: https://commitfest.postgresql.org/43/3514/

--
With Regards,
Amit Kapila.

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#4)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Sat, Oct 14, 2023 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This and other results shared by you look promising. Will there be any
improvement in workloads related to clog buffer usage?

I did not understand this question can you explain this a bit? In
short, if it is regarding the performance then we will see it for all
the SLRUs as the control lock is not centralized anymore instead it is
a bank-wise lock.

BTW, I remember

that there was also a discussion of moving SLRU into a regular buffer
pool [1]. You have not provided any explanation as to whether that
approach will have any merits after we do this or whether that
approach is not worth pursuing at all.

[1] - https://commitfest.postgresql.org/43/3514/

Yeah, I haven't read that thread in detail about performance numbers
and all. But both of these can not coexist because this patch is
improving the SLRU buffer pool access/configurable size and also lock
contention. If we move SLRU to the main buffer pool then we might not
have a similar problem instead there might be other problems like SLRU
buffers getting swapped out due to other relation buffers and all and
OTOH the advantages of that approach would be that we can just use a
bigger buffer pool and SLRU can also take advantage of that. But in
my opinion, most of the time we have limited page access in SLRU and
the SLRU buffer access pattern is also quite different from the
relation pages access pattern so keeping them under the same buffer
pool and comparing against relation pages for victim buffer selection
might cause different problems. But anyway I would discuss those
points maybe in that thread.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#2)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2023-Oct-11, Dilip Kumar wrote:

In my last email, I forgot to give the link from where I have taken
the base path for dividing the buffer pool in banks so giving the same
here[1]. And looking at this again it seems that the idea of that
patch was from Andrey M. Borodin and the idea of the SLRU scale factor
were introduced by Yura Sokolov and Ivan Lazarev. Apologies for
missing that in the first email.

You mean [1]/messages/by-id/452d01f7e331458f56ad79bef537c31b@postgrespro.ru I don't like this idea very much, because of the magic numbers that act as ratios for numbers of buffers on each SLRU compared to other SLRUs. These values, which I took from the documentation part of the patch, appear to have been selected by throwing darts at the wall:.
[1]: /messages/by-id/452d01f7e331458f56ad79bef537c31b@postgrespro.ru I don't like this idea very much, because of the magic numbers that act as ratios for numbers of buffers on each SLRU compared to other SLRUs. These values, which I took from the documentation part of the patch, appear to have been selected by throwing darts at the wall:
I don't like this idea very much, because of the magic numbers that act
as ratios for numbers of buffers on each SLRU compared to other SLRUs.
These values, which I took from the documentation part of the patch,
appear to have been selected by throwing darts at the wall:

NUM_CLOG_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
NUM_COMMIT_TS_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
NUM_SUBTRANS_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)
NUM_NOTIFY_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_SERIAL_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_MULTIXACTOFFSET_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_MULTIXACTMEMBER_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)

... which look pretty random already, if similar enough to the current
hardcoded values. In reality, the code implements different values than
what the documentation says.

I don't see why would CLOG have the same number as COMMIT_TS, when the
size for elements of the latter is like 32 times bigger -- however, the
frequency of reads for COMMIT_TS is like 1000x smaller than for CLOG.
SUBTRANS is half of CLOG, yet it is 16 times larger, and it covers the
same range. The MULTIXACT ones appear to keep the current ratio among
them (8/16 gets changed to 32/64).

... and this whole mess is scaled exponentially without regard to the
size that each SLRU requires. This is just betting that enough memory
can be wasted across all SLRUs up to the point where the one that is
actually contended has sufficient memory. This doesn't sound sensible
to me.

Like everybody else, I like having less GUCs to configure, but going
this far to avoid them looks rather disastrous to me. IMO we should
just use Munro's older patches that gave one GUC per SLRU, and users
only need to increase the one that shows up in pg_wait_event sampling.
Someday we will get the (much more complicated) patches to move these
buffers to steal memory from shared buffers, and that'll hopefully let
use get rid of all this complexity.

I'm inclined to use Borodin's patch last posted here [2]/messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru instead of your
proposed 0001.
[2]: /messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

I did skim patches 0002 and 0003 without going into too much detail;
they look reasonable ideas. I have not tried to reproduce the claimed
performance benefits. I think measuring this patch set with the tests
posted by Shawn Debnath in [3]/messages/by-id/YemDdpMrsoJFQJnU@f01898859afd.ant.amazon.com is important, too.
[3]: /messages/by-id/YemDdpMrsoJFQJnU@f01898859afd.ant.amazon.com

On the other hand, here's a somewhat crazy idea. What if, instead of
stealing buffers from shared_buffers (which causes a lot of complexity),
we allocate a common pool for all SLRUs to use? We provide a single
knob -- say, non_relational_buffers=32MB as default -- and we use a LRU
algorithm (or something) to distribute that memory across all the SLRUs.
So the ratio to use for this SLRU or that one would depend on the nature
of the workload: maybe more for multixact in this server here, but more
for subtrans in that server there; it's just the total amount that the
user would have to configure, side by side with shared_buffers (and
perhaps scale with it like wal_buffers), and the LRU would handle the
rest. The "only" problem here is finding a distribution algorithm that
doesn't further degrade performance, of course ...

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
-- Paul Graham, http://www.paulgraham.com/opensource.html

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#6)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Oct 24, 2023 at 06:04:13PM +0200, Alvaro Herrera wrote:

Like everybody else, I like having less GUCs to configure, but going
this far to avoid them looks rather disastrous to me. IMO we should
just use Munro's older patches that gave one GUC per SLRU, and users
only need to increase the one that shows up in pg_wait_event sampling.
Someday we will get the (much more complicated) patches to move these
buffers to steal memory from shared buffers, and that'll hopefully let
use get rid of all this complexity.

+1

On the other hand, here's a somewhat crazy idea. What if, instead of
stealing buffers from shared_buffers (which causes a lot of complexity),
we allocate a common pool for all SLRUs to use? We provide a single
knob -- say, non_relational_buffers=32MB as default -- and we use a LRU
algorithm (or something) to distribute that memory across all the SLRUs.
So the ratio to use for this SLRU or that one would depend on the nature
of the workload: maybe more for multixact in this server here, but more
for subtrans in that server there; it's just the total amount that the
user would have to configure, side by side with shared_buffers (and
perhaps scale with it like wal_buffers), and the LRU would handle the
rest. The "only" problem here is finding a distribution algorithm that
doesn't further degrade performance, of course ...

I think it's worth a try. It does seem simpler, and it might allow us to
sidestep some concerns about scaling when the SLRU pages are in
shared_buffers [0]/messages/by-id/ZPsaEGRvllitxB3v@tamriel.snowman.net.

[0]: /messages/by-id/ZPsaEGRvllitxB3v@tamriel.snowman.net

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#6)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Oct 24, 2023 at 9:34 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Oct-11, Dilip Kumar wrote:

In my last email, I forgot to give the link from where I have taken
the base path for dividing the buffer pool in banks so giving the same
here[1]. And looking at this again it seems that the idea of that
patch was from Andrey M. Borodin and the idea of the SLRU scale factor
were introduced by Yura Sokolov and Ivan Lazarev. Apologies for
missing that in the first email.

You mean [1].
[1] /messages/by-id/452d01f7e331458f56ad79bef537c31b@postgrespro.ru
I don't like this idea very much, because of the magic numbers that act
as ratios for numbers of buffers on each SLRU compared to other SLRUs.
These values, which I took from the documentation part of the patch,
appear to have been selected by throwing darts at the wall:

NUM_CLOG_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
NUM_COMMIT_TS_BUFFERS = Min(128 << slru_buffers_size_scale, shared_buffers/256)
NUM_SUBTRANS_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)
NUM_NOTIFY_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_SERIAL_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_MULTIXACTOFFSET_BUFFERS = Min(32 << slru_buffers_size_scale, shared_buffers/256)
NUM_MULTIXACTMEMBER_BUFFERS = Min(64 << slru_buffers_size_scale, shared_buffers/256)

... which look pretty random already, if similar enough to the current
hardcoded values. In reality, the code implements different values than
what the documentation says.

I don't see why would CLOG have the same number as COMMIT_TS, when the
size for elements of the latter is like 32 times bigger -- however, the
frequency of reads for COMMIT_TS is like 1000x smaller than for CLOG.
SUBTRANS is half of CLOG, yet it is 16 times larger, and it covers the
same range. The MULTIXACT ones appear to keep the current ratio among
them (8/16 gets changed to 32/64).

... and this whole mess is scaled exponentially without regard to the
size that each SLRU requires. This is just betting that enough memory
can be wasted across all SLRUs up to the point where the one that is
actually contended has sufficient memory. This doesn't sound sensible
to me.

Like everybody else, I like having less GUCs to configure, but going
this far to avoid them looks rather disastrous to me. IMO we should
just use Munro's older patches that gave one GUC per SLRU, and users
only need to increase the one that shows up in pg_wait_event sampling.
Someday we will get the (much more complicated) patches to move these
buffers to steal memory from shared buffers, and that'll hopefully let
use get rid of all this complexity.

Overall I agree with your comments, actually, I haven't put that much
thought into the GUC part and how it scales the SLRU buffers w.r.t.
this single configurable parameter. Yeah, so I think it is better
that we take the older patch version as our base patch where we have
separate GUC per SLRU.

I'm inclined to use Borodin's patch last posted here [2] instead of your
proposed 0001.
[2] /messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

I will rebase my patches on top of this.

I did skim patches 0002 and 0003 without going into too much detail;
they look reasonable ideas. I have not tried to reproduce the claimed
performance benefits. I think measuring this patch set with the tests
posted by Shawn Debnath in [3] is important, too.
[3] /messages/by-id/YemDdpMrsoJFQJnU@f01898859afd.ant.amazon.com

Thanks for taking a look.

On the other hand, here's a somewhat crazy idea. What if, instead of
stealing buffers from shared_buffers (which causes a lot of complexity),

Currently, we do not steal buffers from shared_buffers, computation is
dependent upon Nbuffers though. I mean for each SLRU we are computing
separate memory which is additional than the shared_buffers no?

we allocate a common pool for all SLRUs to use? We provide a single
knob -- say, non_relational_buffers=32MB as default -- and we use a LRU
algorithm (or something) to distribute that memory across all the SLRUs.
So the ratio to use for this SLRU or that one would depend on the nature
of the workload: maybe more for multixact in this server here, but more
for subtrans in that server there; it's just the total amount that the
user would have to configure, side by side with shared_buffers (and
perhaps scale with it like wal_buffers), and the LRU would handle the
rest. The "only" problem here is finding a distribution algorithm that
doesn't further degrade performance, of course ...

Yeah, this could be an idea, but are you talking about that all the
SLRUs will share the single buffer pool and based on the LRU algorithm
it will be decided which page will stay in the buffer pool and which
will be out? But wouldn't that create another issue of different
SLRUs starting to contend on the same lock if we have a common buffer
pool for all the SLRUs? Or am I missing something? Or you are saying
that although there is a common buffer pool each SLRU will have its
own boundaries in it so protected by a separate lock and based on the
workload those boundaries can change dynamically? I haven't put much
thought into how practical the idea is but just trying to understand
what you have in mind.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#5)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Oct 20, 2023 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Oct 14, 2023 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This and other results shared by you look promising. Will there be any
improvement in workloads related to clog buffer usage?

I did not understand this question can you explain this a bit?

I meant to ask about the impact of this patch on accessing transaction
status via TransactionIdGetStatus(). Shouldn't we expect some
improvement in accessing CLOG buffers?

--
With Regards,
Amit Kapila.

#10Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#9)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Oct 25, 2023 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 20, 2023 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Oct 14, 2023 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This and other results shared by you look promising. Will there be any
improvement in workloads related to clog buffer usage?

I did not understand this question can you explain this a bit?

I meant to ask about the impact of this patch on accessing transaction
status via TransactionIdGetStatus(). Shouldn't we expect some
improvement in accessing CLOG buffers?

Yes, there should be because 1) Now there is no common lock so
contention on a centralized control lock will be reduced when we are
accessing the transaction status from pages falling in different SLRU
banks 2) Buffers size is configurable so if the workload is accessing
transactions status of different range then it would help in frequent
buffer eviction but this might not be most common case.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#8)
5 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Oct 25, 2023 at 10:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Oct 24, 2023 at 9:34 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Overall I agree with your comments, actually, I haven't put that much
thought into the GUC part and how it scales the SLRU buffers w.r.t.
this single configurable parameter. Yeah, so I think it is better
that we take the older patch version as our base patch where we have
separate GUC per SLRU.

I'm inclined to use Borodin's patch last posted here [2] instead of your
proposed 0001.
[2] /messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

I will rebase my patches on top of this.

I have taken 0001 and 0002 from [1]/messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru, done some bug fixes in 0001, and
changed the logic of SlruAdjustNSlots() in 0002, such that now it
starts with the next power of 2 value of the configured slots and
keeps doubling the number of banks until we reach the number of banks
to the max SLRU_MAX_BANKS(128) and bank size is bigger than
SLRU_MIN_BANK_SIZE (8). By doing so, we will ensure we don't have too
many banks, but also that we don't have very large banks. There was
also a patch 0003 in this thread but I haven't taken this as this is
another optimization of merging some structure members and I will
analyze the performance characteristic of this and try to add it on
top of the complete patch series.

Patch details:
0001 - GUC parameter for each SLRU
0002 - Divide the SLRU pool into banks
(The above 2 are taken from [1]/messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru with some modification and rebasing by me)
0003 - Implement bank-wise SLRU lock as described in the first email
of this thread
0004 - Implement bank-wise LRU counter as described in the first email
of this thread
0005 - Some other optimization suggested offlist by Alvaro, i.e.
merging buffer locks and bank locks in the same array so that the
bank-wise LRU counter does not fetch the next cache line in a hot
function SlruRecentlyUsed()

Note: I think 0003,0004 and 0005 can be merged together but kept
separate so that we can review them independently and see how useful
each of them is.

[1]: /messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v3-0005-Merge-bank-locks-array-with-buffer-locks-array.patchapplication/octet-stream; name=v3-0005-Merge-bank-locks-array-with-buffer-locks-array.patchDownload
From c80516008f76a8a4b68ff5cab9ada952373ee6ff Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Sat, 28 Oct 2023 16:24:04 +0530
Subject: [PATCH v3 5/5] Merge bank locks array with buffer locks array

This will help us getting the bank_cur_lru_count in same cacheline
which is frequently accessed in SlruRecentlyUsed.
---
 src/backend/access/transam/slru.c | 123 ++++++++++++++++--------------
 src/include/access/slru.h         |  15 ++--
 2 files changed, 72 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 6c8c21f215..3728c02607 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -156,8 +156,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));	/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
-	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN((nslots + nslots) * sizeof(LWLockPadded));	/* locks[] */
 	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
@@ -229,10 +228,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 
 		/* Initialize LWLocks */
-		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
-		shared->bank_locks = (LWLockPadded *) (ptr + offset);
-		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN((nslots + nbanks) * sizeof(LWLockPadded));
 		shared->bank_cur_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(nbanks * sizeof(int));
 
@@ -245,8 +242,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		ptr += BUFFERALIGN(offset);
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
-			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 buffer_tranche_id);
+			LWLockInitialize(&shared->locks[slotno].lock, buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -257,7 +253,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize bank locks for each buffer bank. */
 		for (bankno = 0; bankno < nbanks; bankno++)
 		{
-			LWLockInitialize(&shared->bank_locks[bankno].lock,
+			LWLockInitialize(&shared->locks[nslots + bankno].lock,
 							 bank_tranche_id);
 			shared->bank_cur_lru_count[bankno] = 0;
 		}
@@ -356,12 +352,13 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
 	int			bankno = slotno / ctl->bank_size;
+	int			banklockoffset = shared->num_slots + bankno;
 
 	/* See notes at top of file */
-	LWLockRelease(&shared->bank_locks[bankno].lock);
-	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
-	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->locks[banklockoffset].lock);
+	LWLockAcquire(&shared->locks[slotno].lock, LW_SHARED);
+	LWLockRelease(&shared->locks[slotno].lock);
+	LWLockAcquire(&shared->locks[banklockoffset].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -374,7 +371,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 	if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS ||
 		shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS)
 	{
-		if (LWLockConditionalAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED))
+		if (LWLockConditionalAcquire(&shared->locks[slotno].lock, LW_SHARED))
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
@@ -384,7 +381,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
 				shared->page_dirty[slotno] = true;
 			}
-			LWLockRelease(&shared->buffer_locks[slotno].lock);
+			LWLockRelease(&shared->locks[slotno].lock);
 		}
 	}
 }
@@ -417,6 +414,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	{
 		int			slotno;
 		int			bankno;
+		int			banklockoffset;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -458,11 +456,12 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		shared->page_dirty[slotno] = false;
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
-		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->locks[slotno].lock, LW_EXCLUSIVE);
 		bankno = slotno / ctl->bank_size;
+		banklockoffset = shared->num_slots + bankno;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(&shared->bank_locks[bankno].lock);
+		LWLockRelease(&shared->locks[banklockoffset].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -471,7 +470,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->locks[banklockoffset].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -479,7 +478,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
 
-		LWLockRelease(&shared->buffer_locks[slotno].lock);
+		LWLockRelease(&shared->locks[slotno].lock);
 
 		/* Now it's okay to ereport if we failed */
 		if (!ok)
@@ -516,9 +515,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	int			bankno = pageno & ctl->bank_mask;
 	int			bankstart = bankno * ctl->bank_size;
 	int			bankend = bankstart + ctl->bank_size;
+	int			banklockoffset = shared->num_slots + bankno;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
+	LWLockAcquire(&shared->locks[banklockoffset].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (slotno = bankstart; slotno < bankend; slotno++)
@@ -538,8 +538,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(&shared->bank_locks[bankno].lock);
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->locks[banklockoffset].lock);
+	LWLockAcquire(&shared->locks[banklockoffset].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -562,6 +562,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
 	int			bankno = slotno / ctl->bank_size;
+	int			banklockoffset = shared->num_slots + bankno;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -587,10 +588,10 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	shared->page_dirty[slotno] = false;
 
 	/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
-	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockRelease(&shared->locks[banklockoffset].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -605,7 +606,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[banklockoffset].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -616,7 +617,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 
-	LWLockRelease(&shared->buffer_locks[slotno].lock);
+	LWLockRelease(&shared->locks[slotno].lock);
 
 	/* Now it's okay to ereport if we failed */
 	if (!ok)
@@ -1185,7 +1186,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
-	int			lastbankno = 0;
+	int			prevlockoffset = shared->num_slots;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1196,17 +1197,17 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(&shared->bank_locks[0].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			curbankno = slotno / ctl->bank_size;
+		int			curlockoffset = shared->num_slots + slotno / ctl->bank_size;
 
-		if (curbankno != lastbankno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->bank_locks[lastbankno].lock);
-			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
-			lastbankno = curbankno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		SlruInternalWritePage(ctl, slotno, &fdata);
@@ -1222,7 +1223,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(&shared->bank_locks[lastbankno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1262,7 +1263,8 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
-	int			prevbankno;
+	int			nslots = shared->num_slots;
+	int			prevlockoffset;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1288,21 +1290,21 @@ restart:
 		return;
 	}
 
-	prevbankno = 0;
-	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	prevlockoffset = nslots;
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
+	for (slotno = 0; slotno < nslots; slotno++)
 	{
-		int			curbankno = slotno / ctl->bank_size;
+		int			curlockoffset = nslots + (slotno / ctl->bank_size);
 
 		/*
 		 * If the curbankno is not same as prevbankno then release the lock on
 		 * the prevbankno and acquire the lock on the curbankno.
 		 */
-		if (curbankno != prevbankno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->bank_locks[prevbankno].lock);
-			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
-			prevbankno = curbankno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
@@ -1335,11 +1337,11 @@ restart:
 		else
 			SimpleLruWaitIO(ctl, slotno);
 
-		LWLockRelease(&shared->bank_locks[prevbankno].lock);
+		LWLockRelease(&shared->locks[prevlockoffset].lock);
 		goto restart;
 	}
 
-	LWLockRelease(&shared->bank_locks[prevbankno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1380,28 +1382,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
-	int			prevbankno = 0;
+	int			nslots = shared->num_slots;
+	int			prevlockoffset = nslots;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = 0; slotno < nslots; slotno++)
 	{
 		int			pagesegno;
-		int			curbankno;
+		int			curlockoffset;
 
-		curbankno = slotno / ctl->bank_size;
+		curlockoffset = nslots + (slotno / ctl->bank_size);
 
 		/*
 		 * If the curbankno is not same as prevbankno then release the lock on
 		 * the prevbankno and acquire the lock on the curbankno.
 		 */
-		if (curbankno != prevbankno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->bank_locks[prevbankno].lock);
-			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
-			prevbankno = curbankno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
@@ -1438,7 +1441,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(&shared->bank_locks[prevbankno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 }
 
 /*
@@ -1756,10 +1759,11 @@ SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
 {
 	SlruShared	shared = ctl->shared;
 	int			bankno;
-	int			nbanks = shared->num_slots / ctl->bank_size;
+	int			nslots = shared->num_slots;
+	int			nbanks = nslots / ctl->bank_size;
 
 	for (bankno = 0; bankno < nbanks; bankno++)
-		LWLockAcquire(&shared->bank_locks[bankno].lock, mode);
+		LWLockAcquire(&shared->locks[nslots + bankno].lock, mode);
 }
 
 /*
@@ -1770,8 +1774,9 @@ SimpleLruReleaseAllBankLock(SlruCtl ctl)
 {
 	SlruShared	shared = ctl->shared;
 	int			bankno;
-	int			nbanks = shared->num_slots / ctl->bank_size;
+	int			nslots = shared->num_slots;
+	int			nbanks = nslots / ctl->bank_size;
 
 	for (bankno = 0; bankno < nbanks; bankno++)
-		LWLockRelease(&shared->bank_locks[bankno].lock);
+		LWLockRelease(&shared->locks[nslots + bankno].lock);
 }
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index a18b07f5d0..6759c900f3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -69,14 +69,14 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockPadded *buffer_locks;
 
 	/*
-	 * Locks to protect the in memory buffer slot access in per SLRU bank. The
-	 * buffer_locks protects the I/O on each buffer slots whereas this lock
-	 * protect the in memory operation on the buffer within one SLRU bank.
+	 * This contains nslots numbers of buffers locks and nbanks numbers of
+	 * bank locks.  The buffer locks protects the I/O on each buffer slots
+	 * whereas the bank lock protect the in memory operation on the buffer
+	 * within one SLRU bank.
 	 */
-	LWLockPadded *bank_locks;
+	LWLockPadded *locks;
 
 	/*----------
 	 * Instead of global counter we maintain a bank-wise lru counter because
@@ -169,9 +169,10 @@ typedef SlruCtlData *SlruCtl;
 static inline LWLock *
 SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno)
 {
-	int			bankno = (pageno & ctl->bank_mask);
+	int			banklockoffset =
+		ctl->shared->num_slots + (pageno & ctl->bank_mask);
 
-	return &(ctl->shared->bank_locks[bankno].lock);
+	return &(ctl->shared->locks[banklockoffset].lock);
 }
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
-- 
2.39.2 (Apple Git-143)

v3-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v3-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 0fbd91533ad3f1ee3a4931aafeb7b9aebf40d839 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 16:51:34 +0530
Subject: [PATCH v3 2/5] Divide SLRU buffers into banks

We want to eliminate linear search within SLRU buffers.
To do so we divide SLRU buffers into banks. Each bank holds
approximately 8 buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks.

Andrey M. Borodin with some modification by Dilip Kumar
based on fedback by Alvaro Herrera
---
 src/backend/access/transam/slru.c | 73 +++++++++++++++++++++++++++++--
 src/include/access/slru.h         |  6 +++
 2 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..c339e0a7e4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "port/pg_bitutils.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -71,6 +72,18 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
+
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_MIN_BANK_SIZE	8
+#define SLRU_MAX_BANKS		128
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -134,7 +147,6 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -147,6 +159,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static void SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask);
 
 /*
  * Initialization of shared memory
@@ -156,6 +169,10 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			bankmask_ignore;
+	int			banksize_ignore;
+
+	SlruAdjustNSlots(&nslots, &banksize_ignore, &bankmask_ignore);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -191,6 +208,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 {
 	SlruShared	shared;
 	bool		found;
+	int			bankmask;
+	int			banksize;
+
+	SlruAdjustNSlots(&nslots, &banksize, &bankmask);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -258,7 +279,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +290,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_size = banksize;
+	ctl->bank_mask = bankmask;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +523,14 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * ctl->bank_size;
+	int			bankend = bankstart + ctl->bank_size;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1031,7 +1059,10 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		int			bankstart = (pageno & ctl->bank_mask) * ctl->bank_size;
+		int			bankend = bankstart + ctl->bank_size;
+
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1097,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1613,3 +1644,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Pick bank size optimal for N-assiciative SLRU buffers.
+ *
+ * We expect the bank number to be picked from the lowest bits of the requested
+ * pageno. Thus we want the number of banks to be the power of 2.
+ */
+static void
+SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask)
+{
+	int			nbanks = 1;
+
+	*nslots = (int) pg_nextpower2_32(Max(SLRU_MIN_BANK_SIZE, *nslots));
+	*banksize = *nslots;
+
+	/*
+	 * Adjust the number of banks and per bank size. Start with one bank, then
+	 * double it until we reach SLRU_MAX_BANKS, and the bank size exceeds
+	 * SLRU_MIN_BANK_SIZE.  By doing so, we will ensure we don't have too many
+	 * banks, but also that we don't have very large banks.
+	 */
+	while (nbanks < SLRU_MAX_BANKS && *banksize > SLRU_MIN_BANK_SIZE)
+	{
+		if ((*banksize & 1) != 0)
+			*banksize += 1;
+		*banksize /= 2;
+		nbanks *= 2;
+	}
+
+	elog(DEBUG5, "nslots %d banksize %d nbanks %d ", *nslots, *banksize, nbanks);
+
+	*nslots = *banksize * nbanks;
+	*bankmask = (*nslots / *banksize) - 1;
+}
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..c3fd58185a 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -139,6 +139,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * mask and size for slotno banks
+	 */
+	int			bank_size;
+	Size		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
-- 
2.39.2 (Apple Git-143)

v3-0004-Introduce-bank-wise-LRU-counter.patchapplication/octet-stream; name=v3-0004-Introduce-bank-wise-LRU-counter.patchDownload
From 2ea0f9c9dad8482275eab2e77cc4d128ba2d5196 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Sat, 28 Oct 2023 13:48:44 +0530
Subject: [PATCH v3 4/5] Introduce bank-wise LRU counter

Since we have already divided buffer pool in banks and victim
buffer search is also done at the bank level so there is no need
to have a centralized lru counter.  And this will also improve
the performance by reducing the frequent cpu cache invalidation by
not updating the common variable.

Dilip Kumar based on design idea from Robert Haas
---
 src/backend/access/transam/slru.c | 83 +++++++++++++++++--------------
 src/include/access/slru.h         | 28 +++++++----
 2 files changed, 64 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index cf215627ea..6c8c21f215 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -105,34 +105,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -159,6 +131,8 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno,
+									int banksize);
 static int	SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask);
 
 /*
@@ -184,6 +158,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
 	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -236,8 +211,6 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -260,6 +233,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
 		shared->bank_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -281,8 +256,11 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 		/* Initialize bank locks for each buffer bank. */
 		for (bankno = 0; bankno < nbanks; bankno++)
+		{
 			LWLockInitialize(&shared->bank_locks[bankno].lock,
 							 bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -329,7 +307,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	shared->page_number[slotno] = pageno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
-	SlruRecentlyUsed(shared, slotno);
+	SlruRecentlyUsed(shared, slotno, ctl->bank_size);
 
 	/* Set the buffer to zeroes */
 	MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
@@ -461,7 +439,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				continue;
 			}
 			/* Otherwise, it's ready to use */
-			SlruRecentlyUsed(shared, slotno);
+			SlruRecentlyUsed(shared, slotno, ctl->bank_size);
 
 			/* update the stats counter of pages found in the SLRU */
 			pgstat_count_slru_page_hit(shared->slru_stats_idx);
@@ -507,7 +485,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		if (!ok)
 			SlruReportIOError(ctl, pageno, xid);
 
-		SlruRecentlyUsed(shared, slotno);
+		SlruRecentlyUsed(shared, slotno, ctl->bank_size);
 
 		/* update the stats counter of pages not found in SLRU */
 		pgstat_count_slru_page_read(shared->slru_stats_idx);
@@ -550,7 +528,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
 			/* See comments for SlruRecentlyUsed macro */
-			SlruRecentlyUsed(shared, slotno);
+			SlruRecentlyUsed(shared, slotno, ctl->bank_size);
 
 			/* update the stats counter of pages found in the SLRU */
 			pgstat_count_slru_page_hit(shared->slru_stats_idx);
@@ -1073,7 +1051,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		int			bankstart = (pageno & ctl->bank_mask) * ctl->bank_size;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * ctl->bank_size;
 		int			bankend = bankstart + ctl->bank_size;
 
 		for (slotno = bankstart; slotno < bankend; slotno++)
@@ -1110,7 +1089,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1701,6 +1680,38 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".  Note multiple
+ * evaluation of arguments!
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this macro.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno, int banksize)
+{
+	int			slrubankno = slotno / banksize;
+	int			new_lru_count = shared->bank_cur_lru_count[slrubankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[slrubankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Pick bank size optimal for N-assiciative SLRU buffers.
  *
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index f3545d5f5d..a18b07f5d0 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -78,6 +78,23 @@ typedef struct SlruSharedData
 	 */
 	LWLockPadded *bank_locks;
 
+	/*----------
+	 * Instead of global counter we maintain a bank-wise lru counter because
+	 * a) we are doing the victim buffer selection as bank level so there is
+	 * no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page is therefore the one with the highest value of
+	 *		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -89,17 +106,6 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
-- 
2.39.2 (Apple Git-143)

v3-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v3-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From c5d594053a2ad3056bde425bd52f589e3c102e02 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v3 1/5] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Remove the limits on xact_buffers and commit_ts_buffers.  The default
sizes for those caches are ~0.2% and ~0.1% of shared_buffers, as before,
but now there is no cap at 128 and 16 buffers respectively (unless
track_commit_timestamp is disabled, in the latter case, then we might as
well keep it tiny).  Sizes much larger than the old limits have been
shown to be useful on modern systems, and an earlier commit replaced a
linear search with a hash table to avoid problems with extreme cases.

Patch by Andrey M. Borodin with some Bug fixes by Dilip Kumar
ReviewedBy Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro and Dilip Kumar
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  23 ++-
 src/backend/access/transam/commit_ts.c        |   5 +
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  19 +++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/clog.h                     |  10 ++
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 20 files changed, 298 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 985cabfc0b..0584bcdc51 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 4 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the cotents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 1024, but not fewer than 4 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..6ef9aacb0e 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -58,8 +58,8 @@
 
 /* We need two bits per xact, so four xacts fit in a byte */
 #define CLOG_BITS_PER_XACT	2
-#define CLOG_XACTS_PER_BYTE 4
-#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+StaticAssertDecl((CLOG_BITS_PER_XACT * CLOG_XACTS_PER_BYTE) == BITS_PER_BYTE,
+				 "CLOG_BITS_PER_XACT and CLOG_XACTS_PER_BYTE are inconsistent");
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
 #define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * theoretical maximum useful value, but always at least 4 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(4, xact_buffers);
+	return Min(CLOG_MAX_ALLOWED_BUFFERS, Max(4, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..48826672ea 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,10 +493,15 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 1MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 4 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(4, commit_ts_buffers);
 	return Min(256, Max(4, NBuffers / 256));
 }
 
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..ee25aa0656 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,23 @@ show_timezone(void)
 	return "unknown";
 }
 
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..82acdf4226 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 8;
+int			multixact_members_buffers = 16;
+int			subtrans_buffers = 32;
+int			notify_buffers = 8;
+int			serial_buffers = 16;
+int			xact_buffers = 0;
+int			commit_ts_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7605eff9b9..83acff7037 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		8, 2, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		16, 2, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		32, 2, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		8, 2, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		16, 2, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		0, 0, CLOG_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d08d55c3fe..c21d6468ed 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 32			# memory for pg_subtrans
+#multixact_offsets_buffers = 8		# memory for pg_multixact/offsets
+#multixact_members_buffers = 16		# memory for pg_multixact/members
+#notify_buffers = 8			# memory for pg_notify
+#serial_buffers = 16			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index d99444f073..a9cd65db36 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -15,6 +15,16 @@
 #include "storage/sync.h"
 #include "lib/stringinfo.h"
 
+/*
+ * Don't allow xact_buffers to be set higher than could possibly be useful or
+ * SLRU would allow.
+ */
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..78d017ad85 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -16,7 +16,6 @@
 #include "replication/origin.h"
 #include "storage/sync.h"
 
-
 extern PGDLLIMPORT bool track_commit_timestamp;
 
 extern void TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 2a191830a8..8597e430de 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -161,4 +161,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v3-0003-Bank-wise-slru-locks.patchapplication/octet-stream; name=v3-0003-Bank-wise-slru-locks.patchDownload
From 6b2f662dfe0794dce613c33a21f8f740cf8229e3 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Mon, 30 Oct 2023 11:06:12 +0530
Subject: [PATCH v3 3/5] Bank wise slru locks

The previous patch has divided SLRU buffer pool into associative
banks.  And this patch is further optimizing it by introducing
bank wise slru locks instead of a common centralized lock this
will reduce the contention on the slru control lock.

Dilip Kumar with some design inpur from Robert Haas
and review by Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 114 ++++++++++-----
 src/backend/access/transam/commit_ts.c   |  43 +++---
 src/backend/access/transam/multixact.c   | 177 ++++++++++++++++-------
 src/backend/access/transam/slru.c        | 148 +++++++++++++++----
 src/backend/access/transam/subtrans.c    |  58 ++++++--
 src/backend/commands/async.c             |  32 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  33 +++--
 src/include/access/slru.h                |  32 +++-
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 ++--
 12 files changed, 494 insertions(+), 210 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 6ef9aacb0e..830d8bcdf5 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -274,14 +274,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock w.r.t. the page we are going to access. */
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the SLRU lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -300,17 +305,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -323,10 +328,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -345,7 +350,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetSLRUBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -396,14 +402,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -417,6 +422,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -497,13 +504,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -511,10 +515,38 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/* Acquire the SLRU bank lock w.r.t. the first page in the group. */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetSLRUBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * Although we are trying our best to keep same page in a group, there
+		 * are cases where we might get different pages as well for detail
+		 * refer comment in above while loop where we are adding this process
+		 * for group update.  So if the current page we are going to access is
+		 * not in the same slru bank in which we updated the last page then we
+		 * need to release the lock on the previous bank and acquire lock on
+		 * the bank w.r.t. the page we are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -534,7 +566,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -563,10 +596,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -655,7 +689,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -689,8 +723,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -704,8 +738,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -714,7 +749,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -749,14 +784,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -767,8 +798,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -800,7 +832,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -832,6 +864,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -842,13 +875,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -986,16 +1020,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 48826672ea..204341da53 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -218,8 +218,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -229,13 +230,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -336,7 +337,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -526,9 +527,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -684,9 +684,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -713,12 +711,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -767,9 +766,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -801,6 +800,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -821,12 +821,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -980,16 +982,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1001,7 +1005,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..3284900e02 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -192,10 +192,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +870,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +894,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +917,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +951,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1255,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1360,23 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If the page is on the different SLRU bank then release the lock on the
+	 * previous bank if we are already holding one and acquire the lock on the
+	 * new bank.
+	 */
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,22 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * SLRU pageno is changed so check whether this page is falling in
+			 * the different slru bank than on which we are already holding
+			 * the lock and if so release the lock on the old bank and acquire
+			 * that on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1433,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1443,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1463,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1500,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1852,14 +1911,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1894,8 +1953,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1904,9 +1965,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1915,7 +1977,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1975,10 +2037,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1993,7 +2057,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2015,13 +2079,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2046,13 +2112,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2067,7 +2133,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2075,18 +2143,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2098,7 +2165,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2113,10 +2182,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2404,6 +2472,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2414,13 +2483,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2453,15 +2523,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2759,7 +2831,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3241,31 +3313,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3331,7 +3405,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index c339e0a7e4..cf215627ea 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -159,7 +159,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
-static void SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask);
+static int	SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask);
 
 /*
  * Initialization of shared memory
@@ -171,8 +171,9 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	Size		sz;
 	int			bankmask_ignore;
 	int			banksize_ignore;
+	int			nbanks;
 
-	SlruAdjustNSlots(&nslots, &banksize_ignore, &bankmask_ignore);
+	nbanks = SlruAdjustNSlots(&nslots, &banksize_ignore, &bankmask_ignore);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -182,6 +183,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -198,20 +200,22 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the SLRU's per-bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
 	int			bankmask;
 	int			banksize;
+	int			nbanks;
 
-	SlruAdjustNSlots(&nslots, &banksize, &bankmask);
+	nbanks = SlruAdjustNSlots(&nslots, &banksize, &bankmask);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -223,13 +227,12 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
@@ -255,6 +258,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
 
 		if (nlsns > 0)
 		{
@@ -266,7 +271,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -274,6 +279,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared->page_lru_count[slotno] = 0;
 			ptr += BLCKSZ;
 		}
+		/* Initialize bank locks for each buffer bank. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			LWLockInitialize(&shared->bank_locks[bankno].lock,
+							 bank_tranche_id);
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -329,7 +338,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -368,12 +377,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = slotno / ctl->bank_size;
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -428,6 +438,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -470,9 +481,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = slotno / ctl->bank_size;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -481,7 +493,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -523,11 +535,12 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
-	int			bankstart = (pageno & ctl->bank_mask) * ctl->bank_size;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * ctl->bank_size;
 	int			bankend = bankstart + ctl->bank_size;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (slotno = bankstart; slotno < bankend; slotno++)
@@ -547,8 +560,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -570,6 +583,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			bankno = slotno / ctl->bank_size;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -598,7 +612,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -613,7 +627,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1118,7 +1132,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1192,6 +1206,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			lastbankno = 0;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1202,10 +1217,19 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[0].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / ctl->bank_size;
+
+		if (curbankno != lastbankno)
+		{
+			LWLockRelease(&shared->bank_locks[lastbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			lastbankno = curbankno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1219,7 +1243,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[lastbankno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1259,6 +1283,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevbankno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1269,25 +1294,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbankno = 0;
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbankno = slotno / ctl->bank_size;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1317,10 +1355,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbankno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1361,15 +1401,31 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevbankno = 0;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbankno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbankno;
+
+		curbankno = slotno / ctl->bank_size;
+
+		/*
+		 * If the curbankno is not same as prevbankno then release the lock on
+		 * the prevbankno and acquire the lock on the curbankno.
+		 */
+		if (curbankno != prevbankno)
+		{
+			LWLockRelease(&shared->bank_locks[prevbankno].lock);
+			LWLockAcquire(&shared->bank_locks[curbankno].lock, LW_EXCLUSIVE);
+			prevbankno = curbankno;
+		}
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1403,7 +1459,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbankno].lock);
 }
 
 /*
@@ -1651,7 +1707,7 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
  * We expect the bank number to be picked from the lowest bits of the requested
  * pageno. Thus we want the number of banks to be the power of 2.
  */
-static void
+static int
 SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask)
 {
 	int			nbanks = 1;
@@ -1677,4 +1733,34 @@ SlruAdjustNSlots(int *nslots, int *banksize, int *bankmask)
 
 	*nslots = *banksize * nbanks;
 	*bankmask = (*nslots / *banksize) - 1;
+
+	return nbanks;
+}
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+	int			nbanks = shared->num_slots / ctl->bank_size;
+
+	for (bankno = 0; bankno < nbanks; bankno++)
+		LWLockAcquire(&shared->bank_locks[bankno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			bankno;
+	int			nbanks = shared->num_slots / ctl->bank_size;
+
+	for (bankno = 0; bankno < nbanks; bankno++)
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 }
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..4e3fc5fc51 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -77,12 +77,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -100,7 +102,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -130,7 +132,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -193,8 +195,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -212,8 +215,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -222,7 +226,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -252,6 +256,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -259,23 +265,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -309,6 +339,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -320,12 +351,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..9f14faed78 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -570,7 +571,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1402,7 +1403,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1412,9 +1413,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1438,6 +1437,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	lock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1509,7 +1513,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(lock);
 
 	return nextNotify;
 }
@@ -1988,9 +1992,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2010,7 +2014,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetSLRUBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2051,7 +2055,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2204,7 +2208,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..4098a056e5 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,8 +808,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -846,12 +847,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetSLRUBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -901,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -919,10 +922,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -934,13 +937,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -953,7 +956,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -965,7 +968,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -983,7 +986,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -992,7 +995,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1006,12 +1009,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1071,7 +1074,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c3fd58185a..f3545d5f5d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -57,8 +57,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -73,6 +71,13 @@ typedef struct SlruSharedData
 	int		   *page_lru_count;
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Locks to protect the in memory buffer slot access in per SLRU bank. The
+	 * buffer_locks protects the I/O on each buffer slots whereas this lock
+	 * protect the in memory operation on the buffer within one SLRU bank.
+	 */
+	LWLockPadded *bank_locks;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -100,7 +105,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -149,11 +154,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno)
+{
+	int			bankno = (pageno & ctl->bank_mask);
+
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -181,5 +199,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern LWLock *SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9a02f33933 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

#12Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#11)
5 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Oct 30, 2023 at 11:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Based on some offlist discussions with Alvaro and Robert in separate
conversations, I and Alvaro we came to the same point if a user sets a
very high value for the number of slots (say 1GB) then the number of
slots in each bank will be 1024 (considering max number of bank 128)
and if we continue the sequence search for finding the buffer for the
page then that could be costly in such cases. But later in one of the
conversations with Robert, I realized that we can have this bank-wise
lock approach along with the partitioned hash table.

So the idea is, that we will use the buffer mapping hash table
something like Thoams used in one of his patches [1]/messages/by-id/CA+hUKGLCLDtgDj2Xsf0uBk5WXDCeHxBDDJPsyY7m65Fde-=pyg@mail.gmail.com, but instead of a
normal hash table, we will use the partitioned hash table. The SLRU
buffer pool is still divided as we have done in the bank-wise approach
and there will be separate locks for each slot range. So now we get
the benefit of both approaches 1) By having a mapping hash we can
avoid the sequence search 2) By dividing the buffer pool into banks
and keeping the victim buffer search within those banks we avoid
locking all the partitions during victim buffer search 3) And we can
also maintain a bank-wise LRU counter so that we avoid contention on a
single variable as we have discussed in my first email of this thread.
Please find the updated patch set details and patches attached to the
email.

[1]: /messages/by-id/CA+hUKGLCLDtgDj2Xsf0uBk5WXDCeHxBDDJPsyY7m65Fde-=pyg@mail.gmail.com
patch as the previous patch set
[2]: 0002-Add-a-buffer-mapping-table-for-SLRUs: Patch to introduce buffer mapping hash table
buffer mapping hash table
[3]: 0003-Partition-wise-slru-locks: Partition the hash table and also introduce partition-wise locks: this is a merge of 0003 and 0004 from the previous patch set but instead of bank-wise locks it has partition-wise locks and LRU counter.
introduce partition-wise locks: this is a merge of 0003 and 0004 from
the previous patch set but instead of bank-wise locks it has
partition-wise locks and LRU counter.
[4]: 0004-Merge-partition-locks-array-with-buffer-locks-array: merging buffer locks and bank locks in the same array so that the bank-wise LRU counter does not fetch the next cache line in a hot function SlruRecentlyUsed()(same as 0005 from the previous patch set)
buffer locks and bank locks in the same array so that the bank-wise
LRU counter does not fetch the next cache line in a hot function
SlruRecentlyUsed()(same as 0005 from the previous patch set)
[5]: 0005-Ensure-slru-buffer-slots-are-in-multiple-of-number-of: Ensure that the number of slots is in multiple of the number of banks
that the number of slots is in multiple of the number of banks

With this approach, I have also made some changes where the number of
banks is constant (i.e. 8) so that some of the computations are easy.
I think with a buffer mapping hash table we should not have much
problem in keeping this fixed as with very extreme configuration and
very high numbers of slots also we do not have performance problems as
we are not doing sequence search because of buffer mapping hash and if
the number of slots is set so high then the victim buffer search also
should not be frequent so we should not be worried about sequence
search within a bank for victim buffer search. I have also changed
the default value of the number of slots to 64 and the minimum value
to 16 I think this is a reasonable default value because the existing
values are too low considering the modern hardware and these
parameters is configurable so user can set it to low value if running
with very low memory.

[1]: /messages/by-id/CA+hUKGLCLDtgDj2Xsf0uBk5WXDCeHxBDDJPsyY7m65Fde-=pyg@mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0005-Ensure-slru-buffer-slots-are-in-multiple-of-numbe.patchapplication/octet-stream; name=v4-0005-Ensure-slru-buffer-slots-are-in-multiple-of-numbe.patchDownload
From 51be79b5c580a794760cf1baf4e040c55443adc6 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 2 Nov 2023 11:40:08 +0530
Subject: [PATCH v4 5/5] Ensure slru buffer slots are in multiple of numbe of
 partitions

---
 src/backend/access/transam/clog.c      | 10 ++++++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++++++++++
 src/backend/access/transam/slru.c      | 18 ++++++++++++++++++
 src/backend/access/transam/subtrans.c  | 10 ++++++++++
 src/backend/commands/async.c           | 10 ++++++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++++++
 src/backend/utils/misc/guc_tables.c    | 14 +++++++-------
 src/include/access/slru.h              |  1 +
 src/include/utils/guc_hooks.h          | 11 +++++++++++
 10 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index ab453cd171..17e08792d4 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1056,3 +1057,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 58314e3885..4fd01c5ce8 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1022,3 +1023,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index aa4f11fd3b..d0ce4e28d2 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3494,3 +3495,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 8b89a86a10..bac6bf1d42 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc.h"
 #include "utils/hsearch.h"
 
 #define SlruFileName(ctl, path, seg) \
@@ -1850,3 +1851,20 @@ SimpleLruUnLockAllPartitions(SlruCtl ctl)
 	for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
 		LWLockRelease(&shared->locks[nslots + partno].lock);
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_NUM_PARTITIONS.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_NUM_PARTITIONS == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_NUM_PARTITIONS);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index e4da6e28ae..16a26a2ca5 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -406,3 +407,12 @@ SubTransPagePrecedes(int page1, int page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 81fdca410b..0ea6880764 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -149,6 +149,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2462,3 +2463,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 6b7c1aa00e..40089a606d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5014,3 +5015,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c82635943b..7c85d2126e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, CLOG_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index ac1227f29f..fef23d30f5 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -198,4 +198,5 @@ extern LWLock *SimpleLruGetPartitionLock(SlruCtl ctl, int pageno);
 extern void SimpleLruLockAllPartitions(SlruCtl ctl, LWLockMode mode);
 extern void SimpleLruUnLockAllPartitions(SlruCtl ctl);
 extern LWLock *SimpleLruGetPartitionLock(SlruCtl ctl, int pageno);
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 8597e430de..7dd96a2059 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -128,6 +128,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v4-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v4-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From acfdf8c7bc64026d51c7f187080294843e805617 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v4 1/5] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Remove the limits on xact_buffers and commit_ts_buffers.  The default
sizes for those caches are ~0.2% and ~0.1% of shared_buffers, as before,
but now there is no cap at 128 and 16 buffers respectively (unless
track_commit_timestamp is disabled, in the latter case, then we might as
well keep it tiny).  Sizes much larger than the old limits have been
shown to be useful on modern systems.

Patch by Andrey M. Borodin with some Bug fixes by Dilip Kumar
ReviewedBy Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro and Dilip Kumar
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  23 ++-
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  19 +++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/clog.h                     |  10 ++
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 20 files changed, 299 insertions(+), 45 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 985cabfc0b..eeb21efdd4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>8</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the cotents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 1024, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..7979bbd00f 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -58,8 +58,8 @@
 
 /* We need two bits per xact, so four xacts fit in a byte */
 #define CLOG_BITS_PER_XACT	2
-#define CLOG_XACTS_PER_BYTE 4
-#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+StaticAssertDecl((CLOG_BITS_PER_XACT * CLOG_XACTS_PER_BYTE) == BITS_PER_BYTE,
+				 "CLOG_BITS_PER_XACT and CLOG_XACTS_PER_BYTE are inconsistent");
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
 #define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * theoretical maximum useful value, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(CLOG_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..47a1c9f0e5 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,11 +493,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 1MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..ee25aa0656 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,23 @@ show_timezone(void)
 	return "unknown";
 }
 
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7605eff9b9..c82635943b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, CLOG_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d08d55c3fe..c21d6468ed 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 32			# memory for pg_subtrans
+#multixact_offsets_buffers = 8		# memory for pg_multixact/offsets
+#multixact_members_buffers = 16		# memory for pg_multixact/members
+#notify_buffers = 8			# memory for pg_notify
+#serial_buffers = 16			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index d99444f073..a9cd65db36 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -15,6 +15,16 @@
 #include "storage/sync.h"
 #include "lib/stringinfo.h"
 
+/*
+ * Don't allow xact_buffers to be set higher than could possibly be useful or
+ * SLRU would allow.
+ */
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..78d017ad85 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -16,7 +16,6 @@
 #include "replication/origin.h"
 #include "storage/sync.h"
 
-
 extern PGDLLIMPORT bool track_commit_timestamp;
 
 extern void TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 2a191830a8..8597e430de 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -161,4 +161,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v4-0003-Partition-wise-slru-locks.patchapplication/octet-stream; name=v4-0003-Partition-wise-slru-locks.patchDownload
From ee74be845bbaff6d4db6add978f016292d90de10 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 2 Nov 2023 14:02:37 +0530
Subject: [PATCH v4 3/5] Partition wise slru locks

The previous patch has implemented a buffer mapping hash
table. Now this patch is further optimizing it by making the
hash table partitioned and introducing a partition-wise locks
instead of a common centralized lock this will reduce the
contention on the slru control lock. Here we also make the
victim buffer search limited within the slots covered by a
single partition.

Dilip Kumar with design input from Robert Haas
---
 src/backend/access/transam/clog.c        | 115 ++++++----
 src/backend/access/transam/commit_ts.c   |  43 ++--
 src/backend/access/transam/multixact.c   | 177 ++++++++++-----
 src/backend/access/transam/slru.c        | 261 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  59 +++--
 src/backend/commands/async.c             |  46 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  35 +--
 src/include/access/slru.h                |  52 +++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 +--
 12 files changed, 601 insertions(+), 254 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7979bbd00f..ab453cd171 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -274,14 +274,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU partition lock w.r.t. the page we are going to access. */
+	lock = SimpleLruGetPartitionLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the SLRU lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -300,17 +305,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -323,10 +328,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -345,7 +350,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetPartitionLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -396,14 +402,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU parition lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -417,6 +422,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -497,13 +504,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -511,10 +515,39 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/* Acquire the SLRU partition lock w.r.t. the first page in the group. */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetPartitionLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * Although we are trying our best to keep same page in a group, there
+		 * are cases where we might get different pages as well for detail
+		 * refer comment in above while loop where we are adding this process
+		 * for group update.  So if the current page we are going to access is
+		 * not in the same slru partition in which we updated the last page
+		 * then we need to release the lock on the previous partition and
+		 * acquire lock on the partition w.r.t. the page we are going to
+		 * update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetPartitionLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -534,7 +567,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -563,10 +597,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -655,7 +690,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetPartitionLock(XactCtl, pageno));
 
 	return status;
 }
@@ -689,8 +724,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -704,8 +739,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetPartitionLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -714,7 +750,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -749,14 +785,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -767,8 +799,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetPartitionLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -800,7 +833,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -832,6 +865,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -842,13 +876,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetPartitionLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -986,16 +1021,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetPartitionLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 47a1c9f0e5..58314e3885 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -218,8 +218,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetPartitionLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -229,13 +230,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU partition's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -336,7 +337,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetPartitionLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -526,9 +527,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -684,9 +684,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -713,12 +711,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetPartitionLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -767,9 +766,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruLockAllPartitions(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruUnLockAllPartitions(CommitTsCtl);
 }
 
 /*
@@ -801,6 +800,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -821,12 +821,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetPartitionLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -980,16 +982,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetPartitionLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1001,7 +1005,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..aa4f11fd3b 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -192,10 +192,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU partition's lock of MultiXactOffset
+ * and MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +870,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +894,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +917,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU partition then release the old
+			 * partition's lock and acquire lock on the new partition.
+			 */
+			lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +951,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1255,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1360,23 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If the page is on the different SLRU partition then release the lock on
+	 * the previous partition if we are already holding one and acquire the
+	 * lock on the new partition.
+	 */
+	lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,22 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * SLRU pageno is changed so check whether this page is falling in
+			 * the different slru partition than on which we are already
+			 * holding the lock and if so release the lock on the old
+			 * partition and acquire that on the new partition.
+			 */
+			lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1433,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1443,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1463,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU partition then release the old
+			 * partition's lock and acquire lock on the new partition.
+			 */
+			lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1500,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1852,14 +1911,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1894,8 +1953,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1904,9 +1965,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1915,7 +1977,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1975,10 +2037,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1993,7 +2057,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2015,13 +2079,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2046,13 +2112,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2067,7 +2133,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2075,18 +2143,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2098,7 +2165,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2113,10 +2182,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2404,6 +2472,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2414,13 +2483,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2453,15 +2523,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2759,7 +2831,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3241,31 +3313,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetPartitionLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetPartitionLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3331,7 +3405,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ac23076def..ab7cd276ce 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -71,6 +71,7 @@
  * to SimpleLruWriteAll().  This data structure remembers which files are open.
  */
 #define MAX_WRITEALL_BUFFERS	16
+#define SLRU_NUM_PARTITIONS		8
 
 typedef struct SlruWriteAllData
 {
@@ -102,34 +103,6 @@ typedef struct SlruMappingTableEntry
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -160,6 +133,9 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
 static void SlruMappingAdd(SlruCtl ctl, int pageno, int slotno);
 static void SlruMappingRemove(SlruCtl ctl, int pageno);
 static int	SlruMappingFind(SlruCtl ctl, int pageno);
+static inline int SlruMappingPartNo(SlruCtl ctl, int pageno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno,
+									int partsize);
 
 /*
  * Helper function of SimpleLruShmemSize to compute the SlruSharedData size.
@@ -177,6 +153,8 @@ SimpleLruStructSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded));	/* part_locks[] */
+	sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(int));   /* part_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -207,7 +185,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int part_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	char		mapping_table_name[SHMEM_INDEX_KEYSIZE];
@@ -226,18 +204,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			partno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -258,6 +233,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->part_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded));
+		shared->part_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -269,7 +248,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -277,6 +256,13 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared->page_lru_count[slotno] = 0;
 			ptr += BLCKSZ;
 		}
+		/* Initialize partition locks for each buffer partition. */
+		for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
+		{
+			LWLockInitialize(&shared->part_locks[partno].lock,
+							 part_tranche_id);
+			shared->part_cur_lru_count[partno] = 0;
+		}
 
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
@@ -288,10 +274,12 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	memset(&mapping_table_info, 0, sizeof(mapping_table_info));
 	mapping_table_info.keysize = sizeof(int);
 	mapping_table_info.entrysize = sizeof(SlruMappingTableEntry);
+	mapping_table_info.num_partitions = SLRU_NUM_PARTITIONS;
 	snprintf(mapping_table_name, sizeof(mapping_table_name),
 			 "%s Lookup Table", name);
 	mapping_table = ShmemInitHash(mapping_table_name, nslots, nslots,
-								  &mapping_table_info, HASH_ELEM | HASH_BLOBS);
+								  &mapping_table_info,
+								  HASH_ELEM | HASH_BLOBS | HASH_PARTITION);
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -300,6 +288,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->mapping_table = mapping_table;
 	ctl->sync_handler = sync_handler;
+	ctl->part_size = shared->num_slots / SLRU_NUM_PARTITIONS;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -331,7 +320,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	shared->page_number[slotno] = pageno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
-	SlruRecentlyUsed(shared, slotno);
+	SlruRecentlyUsed(shared, slotno, ctl->part_size);
 
 	/* Set the buffer to zeroes */
 	MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
@@ -340,7 +329,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -379,12 +368,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			partno = slotno / ctl->part_size;
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->part_locks[partno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -442,6 +432,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			partno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -464,7 +455,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				continue;
 			}
 			/* Otherwise, it's ready to use */
-			SlruRecentlyUsed(shared, slotno);
+			SlruRecentlyUsed(shared, slotno, ctl->part_size);
 
 			/* update the stats counter of pages found in the SLRU */
 			pgstat_count_slru_page_hit(shared->slru_stats_idx);
@@ -487,9 +478,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		partno = slotno / ctl->part_size;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->part_locks[partno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -498,7 +490,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -518,7 +510,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		if (!ok)
 			SlruReportIOError(ctl, pageno, xid);
 
-		SlruRecentlyUsed(shared, slotno);
+		SlruRecentlyUsed(shared, slotno, ctl->part_size);
 
 		/* update the stats counter of pages not found in SLRU */
 		pgstat_count_slru_page_read(shared->slru_stats_idx);
@@ -546,9 +538,13 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			partno;
+
+	/* Determine partition number for the page. */
+	partno = SlruMappingPartNo(ctl, pageno);
 
-	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	/* Try to find the page while holding only shared partition lock */
+	LWLockAcquire(&shared->part_locks[partno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	slotno = SlruMappingFind(ctl, pageno);
@@ -559,7 +555,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 		Assert(shared->page_number[slotno] == pageno);
 
 		/* See comments for SlruRecentlyUsed macro */
-		SlruRecentlyUsed(shared, slotno);
+		SlruRecentlyUsed(shared, slotno, ctl->part_size);
 
 		/* update the stats counter of pages found in the SLRU */
 		pgstat_count_slru_page_hit(shared->slru_stats_idx);
@@ -568,8 +564,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->part_locks[partno].lock);
+	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -591,6 +587,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			partno = slotno / ctl->part_size;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -619,7 +616,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->part_locks[partno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -634,7 +631,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1078,6 +1075,9 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			partno;
+		int			partstart;
+		int			partend;
 
 		/* See if page already has a buffer assigned */
 		slotno = SlruMappingFind(ctl, pageno);
@@ -1088,6 +1088,14 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 			return slotno;
 		}
 
+		/*
+		 * Get the partition start and partition end slotno based on the
+		 * partition no.
+		 */
+		partno = SlruMappingPartNo(ctl, pageno);
+		partstart = partno * ctl->part_size;
+		partend = partstart + ctl->part_size;
+
 		/*
 		 * If we find any EMPTY slot, just select that one. Else choose a
 		 * victim page to replace.  We normally take the least recently used
@@ -1115,8 +1123,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->part_cur_lru_count[partno])++;
+		for (slotno = partstart; slotno < partend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1137,7 +1145,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1211,6 +1219,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			lastpartno = 0;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1221,10 +1230,19 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->part_locks[0].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curpartno = slotno / ctl->part_size;
+
+		if (curpartno != lastpartno)
+		{
+			LWLockRelease(&shared->part_locks[lastpartno].lock);
+			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
+			lastpartno = curpartno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1238,7 +1256,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->part_locks[lastpartno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1278,6 +1296,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevpartno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1288,25 +1307,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevpartno = 0;
+	LWLockAcquire(&shared->part_locks[prevpartno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curpartno = slotno / ctl->part_size;
+
+		/*
+		 * If the curpartno is not same as prevpartno then release the lock on
+		 * the prevpartno and acquire the lock on the curpartno.
+		 */
+		if (curpartno != prevpartno)
+		{
+			LWLockRelease(&shared->part_locks[prevpartno].lock);
+			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
+			prevpartno = curpartno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1337,10 +1369,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->part_locks[prevpartno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->part_locks[prevpartno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1381,15 +1415,31 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevpartno = 0;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->part_locks[prevpartno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curpartno;
+
+		curpartno = slotno / ctl->part_size;
 
+		/*
+		 * If the curpartno is not same as prevpartno then release the lock on
+		 * the prevpartno and acquire the lock on the curpartno.
+		 */
+		if (curpartno != prevpartno)
+		{
+			LWLockRelease(&shared->part_locks[prevpartno].lock);
+			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
+			prevpartno = curpartno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1424,7 +1474,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->part_locks[prevpartno].lock);
 }
 
 /*
@@ -1636,6 +1686,38 @@ SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data)
 	return retval;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".  Note multiple
+ * evaluation of arguments!
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of part_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either part_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this macro.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno, int partsize)
+{
+	int			slrupartno = slotno / partsize;
+	int			new_lru_count = shared->part_cur_lru_count[slrupartno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->part_cur_lru_count[slrupartno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Individual SLRUs (clog, ...) have to provide a sync.c handler function so
  * that they can provide the correct "SlruCtl" (otherwise we don't know how to
@@ -1709,3 +1791,56 @@ SlruMappingRemove(SlruCtl ctl, int pageno)
 
 	Assert(found);
 }
+
+/*
+ * The slru buffer mapping table is partitioned to reduce contention. To
+ * determine which partition lock a given pageno requires, compute the pageno's
+ * hash code with SlruBufTableHashCode(), then apply SlruPartitionLock().
+ */
+static inline int
+SlruMappingPartNo(SlruCtl ctl, int pageno)
+{
+	uint32 hashcode = get_hash_value(ctl->mapping_table, (void *) &pageno);
+
+	return hashcode % SLRU_NUM_PARTITIONS;
+}
+
+/*
+ * Get the SLRU part lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective part.  For more details refer comments in SlruSharedData.
+ */
+LWLock *
+SimpleLruGetPartitionLock(SlruCtl ctl, int pageno)
+{
+	int			partno = SlruMappingPartNo(ctl, pageno);
+
+	return &(ctl->shared->part_locks[partno].lock);
+}
+
+/*
+* Function to acquire all partitions' lock of the given SlruCtl
+*/
+void
+SimpleLruLockAllPartitions(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			partno;
+
+	for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
+		LWLockAcquire(&shared->part_locks[partno].lock, mode);
+}
+
+/*
+* Function to release all partitions' lock of the given SlruCtl
+*/
+void
+SimpleLruUnLockAllPartitions(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			partno;
+
+	for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
+		LWLockRelease(&shared->part_locks[partno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..e4da6e28ae 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -77,12 +77,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetPartitionLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -100,7 +102,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -130,7 +132,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetPartitionLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -193,8 +195,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -212,8 +215,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetPartitionLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -222,7 +226,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -252,6 +256,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -259,23 +265,48 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetPartitionLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetPartitionLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new partition then
+		 * release the lock on the old partition and acquire on the new
+		 * partition.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetPartitionLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new partition then release
+	 * the lock on the old partition and acquire on the new partition.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -309,6 +340,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -320,12 +352,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetPartitionLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..81fdca410b 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in partitions and partition wise SLRU lock is
+ * used as the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU partition lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -570,7 +571,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1402,7 +1403,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU partition lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1412,9 +1413,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1438,6 +1437,14 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetPartitionLock(NotifyCtl, pageno);
+
+	/*
+	 * We hold both NotifyQueueLock and SLRU partition lock during this
+	 * operation.
+	 */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1483,6 +1490,8 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1491,6 +1500,15 @@ asyncQueueAddEntries(ListCell *nextNotify)
 			 * asyncQueueIsFull() ensured that there is room to create this
 			 * page without overrunning the queue.
 			 */
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetPartitionLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruZeroPage(NotifyCtl, QUEUE_POS_PAGE(queue_head));
 
 			/*
@@ -1509,7 +1527,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1988,9 +2006,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2010,7 +2028,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetPartitionLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2051,7 +2069,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU partition lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2204,7 +2222,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU partition locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..6b7c1aa00e 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,8 +808,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -846,12 +847,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetPartitionLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -901,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -919,10 +922,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -934,13 +937,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
-	 * but will return with that lock held, which must then be released.
+	 * The following function must be called without holding SLRU partition
+	 * lock, but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetPartitionLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -953,7 +956,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -965,7 +968,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -983,7 +986,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -992,7 +995,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1006,12 +1009,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1071,7 +1074,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 9cd0899f1d..e6c54d5519 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -58,8 +58,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -75,33 +73,47 @@ typedef struct SlruSharedData
 	LWLockPadded *buffer_locks;
 
 	/*
-	 * Optional array of WAL flush LSNs associated with entries in the SLRU
-	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
-	 * highest LSN known for a contiguous group of SLRU entries on that slot's
-	 * page.
+	 * Locks to protect the in memory buffer slot access in per SLRU bank. The
+	 * buffer_locks protects the I/O on each buffer slots whereas this lock
+	 * protect the in memory operation on the buffer within one SLRU bank.
 	 */
-	XLogRecPtr *group_lsn;
-	int			lsn_groups_per_page;
+	LWLockPadded *part_locks;
 
 	/*----------
+	 * Instead of global counter we maintain a partition-wise lru counter
+	 * because
+	 * a) we are doing the victim buffer selection as partition level so there
+	 * is no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
 	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
+	 *		page_lru_count[slotno] = ++part_cur_lru_count[partno];
 	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
+	 *		part_cur_lru_count[partno] - page_lru_count[slotno]
 	 * The counts will eventually wrap around, but this calculation still
 	 * works as long as no page's age exceeds INT_MAX counts.
 	 *----------
 	 */
-	int			cur_lru_count;
+	int		   *part_cur_lru_count;
+
+	/*
+	 * Optional array of WAL flush LSNs associated with entries in the SLRU
+	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
+	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
+	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * highest LSN known for a contiguous group of SLRU entries on that slot's
+	 * page.
+	 */
+	XLogRecPtr *group_lsn;
+	int			lsn_groups_per_page;
 
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -143,6 +155,9 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/* Size of one slru buffer pool partition */
+	int			part_size;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -150,8 +165,8 @@ typedef SlruCtlData *SlruCtl;
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -179,5 +194,8 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern LWLock *SimpleLruGetPartitionLock(SlruCtl ctl, int pageno);
+extern void SimpleLruLockAllPartitions(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruUnLockAllPartitions(SlruCtl ctl);
+extern LWLock *SimpleLruGetPartitionLock(SlruCtl ctl, int pageno);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..b9178d0ee2 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetPartitionLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetPartitionLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetPartitionLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetPartitionLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

v4-0002-Add-a-buffer-mapping-table-for-SLRUs.patchapplication/octet-stream; name=v4-0002-Add-a-buffer-mapping-table-for-SLRUs.patchDownload
From cb46346ee896b4ea7778d0e0562e1a250e771bb6 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Tue, 31 Oct 2023 10:26:45 +0530
Subject: [PATCH v4 2/5] Add a buffer mapping table for SLRUs.

Instead of doing a linear search for the buffer holding a given page
number, use a hash table.  This will allow us to increase the size of
these caches.

Patch By: Thomas Munro and some adjustment by Dilip Kumar
Reviewed-by: Andrey M. Borodin and Dilip Kumar
---
 src/backend/access/transam/slru.c | 140 +++++++++++++++++++++++++-----
 src/include/access/slru.h         |   4 +
 src/tools/pgindent/typedefs.list  |   1 +
 3 files changed, 123 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..ac23076def 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/hsearch.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -80,6 +81,15 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+/*
+ * hash table entry for mapping from pageno to the slotno in SLRU buffer pool.
+ */
+typedef struct SlruMappingTableEntry
+{
+	int			pageno;
+	int			slotno;
+} SlruMappingTableEntry;
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -147,13 +157,15 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static void SlruMappingAdd(SlruCtl ctl, int pageno, int slotno);
+static void SlruMappingRemove(SlruCtl ctl, int pageno);
+static int	SlruMappingFind(SlruCtl ctl, int pageno);
 
 /*
- * Initialization of shared memory
+ * Helper function of SimpleLruShmemSize to compute the SlruSharedData size.
  */
-
-Size
-SimpleLruShmemSize(int nslots, int nlsns)
+static Size
+SimpleLruStructSize(int nslots, int nlsns)
 {
 	Size		sz;
 
@@ -168,10 +180,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
-
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
+/*
+ * Initialization of shared memory.
+ */
+Size
+SimpleLruShmemSize(int nslots, int nlsns)
+{
+	return SimpleLruStructSize(nslots, nlsns) +
+		hash_estimate_size(nslots, sizeof(SlruMappingTableEntry));
+}
+
 /*
  * Initialize, or attach to, a simple LRU cache in shared memory.
  *
@@ -189,11 +210,14 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  LWLock *ctllock, const char *subdir, int tranche_id,
 			  SyncRequestHandler sync_handler)
 {
+	char		mapping_table_name[SHMEM_INDEX_KEYSIZE];
+	HASHCTL		mapping_table_info;
+	HTAB	   *mapping_table;
 	SlruShared	shared;
 	bool		found;
 
 	shared = (SlruShared) ShmemInitStruct(name,
-										  SimpleLruShmemSize(nslots, nlsns),
+										  SimpleLruStructSize(nslots, nlsns),
 										  &found);
 
 	if (!IsUnderPostmaster)
@@ -260,11 +284,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	else
 		Assert(found);
 
+	/* Create or find the buffer mapping table. */
+	memset(&mapping_table_info, 0, sizeof(mapping_table_info));
+	mapping_table_info.keysize = sizeof(int);
+	mapping_table_info.entrysize = sizeof(SlruMappingTableEntry);
+	snprintf(mapping_table_name, sizeof(mapping_table_name),
+			 "%s Lookup Table", name);
+	mapping_table = ShmemInitHash(mapping_table_name, nslots, nslots,
+								  &mapping_table_info, HASH_ELEM | HASH_BLOBS);
+
 	/*
 	 * Initialize the unshared control struct, including directory path. We
 	 * assume caller set PagePrecedes.
 	 */
 	ctl->shared = shared;
+	ctl->mapping_table = mapping_table;
 	ctl->sync_handler = sync_handler;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -291,6 +325,9 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 		   shared->page_number[slotno] == pageno);
 
 	/* Mark the slot as containing this page */
+	if (shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+		SlruMappingRemove(ctl, shared->page_number[slotno]);
+	SlruMappingAdd(ctl, pageno, slotno);
 	shared->page_number[slotno] = pageno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
@@ -364,7 +401,10 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				SlruMappingRemove(ctl, shared->page_number[slotno]);
 				shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+			}
 			else				/* write_in_progress */
 			{
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
@@ -438,6 +478,9 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				!shared->page_dirty[slotno]));
 
 		/* Mark the slot read-busy */
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
+		SlruMappingAdd(ctl, pageno, slotno);
 		shared->page_number[slotno] = pageno;
 		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
 		shared->page_dirty[slotno] = false;
@@ -461,7 +504,13 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
 			   !shared->page_dirty[slotno]);
 
-		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
+		if (ok)
+			shared->page_status[slotno] = SLRU_PAGE_VALID;
+		else
+		{
+			SlruMappingRemove(ctl, pageno);
+			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+		}
 
 		LWLockRelease(&shared->buffer_locks[slotno].lock);
 
@@ -502,20 +551,20 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	slotno = SlruMappingFind(ctl, pageno);
+	if (slotno >= 0 &&
+		shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
-			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
-		{
-			/* See comments for SlruRecentlyUsed macro */
-			SlruRecentlyUsed(shared, slotno);
+		Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+		Assert(shared->page_number[slotno] == pageno);
 
-			/* update the stats counter of pages found in the SLRU */
-			pgstat_count_slru_page_hit(shared->slru_stats_idx);
+		/* See comments for SlruRecentlyUsed macro */
+		SlruRecentlyUsed(shared, slotno);
 
-			return slotno;
-		}
+		/* update the stats counter of pages found in the SLRU */
+		pgstat_count_slru_page_hit(shared->slru_stats_idx);
+
+		return slotno;
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
@@ -1031,11 +1080,12 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		slotno = SlruMappingFind(ctl, pageno);
+		if (slotno >= 0)
 		{
-			if (shared->page_number[slotno] == pageno &&
-				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
-				return slotno;
+			Assert(shared->page_number[slotno] == pageno);
+			Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+			return slotno;
 		}
 
 		/*
@@ -1268,6 +1318,7 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1350,6 +1401,7 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			SlruMappingRemove(ctl, shared->page_number[slotno]);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1613,3 +1665,47 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Lookup the given pageno entry; return buffer slotno, or -1 if not found.
+ */
+static int
+SlruMappingFind(SlruCtl ctl, int pageno)
+{
+	SlruMappingTableEntry *mapping;
+
+	mapping = hash_search(ctl->mapping_table, &pageno, HASH_FIND, NULL);
+	if (mapping)
+		return mapping->slotno;
+
+	return -1;
+}
+
+/*
+ * Insert a hashtable entry for given pageno and buffer slotno, unless an entry
+ * already exists for that pageno.
+ */
+static void
+SlruMappingAdd(SlruCtl ctl, int pageno, int slotno)
+{
+	SlruMappingTableEntry *mapping;
+	bool		found PG_USED_FOR_ASSERTS_ONLY;
+
+	mapping = hash_search(ctl->mapping_table, &pageno, HASH_ENTER, &found);
+	mapping->slotno = slotno;
+
+	Assert(!found);
+}
+
+/*
+ * Delete the hashtable entry for given tag (which must exist).
+ */
+static void
+SlruMappingRemove(SlruCtl ctl, int pageno)
+{
+	bool		found PG_USED_FOR_ASSERTS_ONLY;
+
+	hash_search(ctl->mapping_table, &pageno, HASH_REMOVE, &found);
+
+	Assert(found);
+}
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..9cd0899f1d 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -16,6 +16,7 @@
 #include "access/xlogdefs.h"
 #include "storage/lwlock.h"
 #include "storage/sync.h"
+#include "utils/hsearch.h"
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -116,6 +117,9 @@ typedef struct SlruCtlData
 {
 	SlruShared	shared;
 
+	/* Buffer mapping hash table over slru buffer pool */
+	HTAB	   *mapping_table;
+
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 87c1aee379..ec8957f12a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2568,6 +2568,7 @@ SlotNumber
 SlruCtl
 SlruCtlData
 SlruErrorCause
+SlruMappingTableEntry
 SlruPageStatus
 SlruScanCallback
 SlruShared
-- 
2.39.2 (Apple Git-143)

v4-0004-Merge-partition-locks-array-with-buffer-locks-arr.patchapplication/octet-stream; name=v4-0004-Merge-partition-locks-array-with-buffer-locks-arr.patchDownload
From 30cc4cc9d7f2c65bfa072349ddd26aaa3b3ae0cd Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 2 Nov 2023 10:59:03 +0530
Subject: [PATCH v4 4/5] Merge partition locks array with buffer locks array

This will help us getting the part_cur_lru_count in same cacheline
which is frequently accessed in SlruRecentlyUsed.
---
 src/backend/access/transam/slru.c | 122 ++++++++++++++++--------------
 src/include/access/slru.h         |  10 +--
 2 files changed, 69 insertions(+), 63 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ab7cd276ce..8b89a86a10 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -152,8 +152,7 @@ SimpleLruStructSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(bool));	/* page_dirty[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
-	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
-	sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded));	/* part_locks[] */
+	sz += MAXALIGN((nslots + SLRU_NUM_PARTITIONS) * sizeof(LWLockPadded));	/* locks[] */
 	sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(int));   /* part_cur_lru_count[] */
 
 	if (nlsns > 0)
@@ -231,10 +230,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		offset += MAXALIGN(nslots * sizeof(int));
 
 		/* Initialize LWLocks */
-		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
-		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
-		shared->part_locks = (LWLockPadded *) (ptr + offset);
-		offset += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded));
+		shared->locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN((nslots + SLRU_NUM_PARTITIONS) * sizeof(LWLockPadded));
 		shared->part_cur_lru_count = (int *) (ptr + offset);
 		offset += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(int));
 
@@ -247,8 +244,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		ptr += BUFFERALIGN(offset);
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
-			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 buffer_tranche_id);
+			LWLockInitialize(&shared->locks[slotno].lock, buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -259,7 +255,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize partition locks for each buffer partition. */
 		for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
 		{
-			LWLockInitialize(&shared->part_locks[partno].lock,
+			LWLockInitialize(&shared->locks[nslots + partno].lock,
 							 part_tranche_id);
 			shared->part_cur_lru_count[partno] = 0;
 		}
@@ -369,12 +365,13 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
 	int			partno = slotno / ctl->part_size;
+	int			partlockoffset = shared->num_slots + partno;
 
 	/* See notes at top of file */
-	LWLockRelease(&shared->part_locks[partno].lock);
-	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
-	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->locks[partlockoffset].lock);
+	LWLockAcquire(&shared->locks[slotno].lock, LW_SHARED);
+	LWLockRelease(&shared->locks[slotno].lock);
+	LWLockAcquire(&shared->locks[partlockoffset].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -387,7 +384,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 	if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS ||
 		shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS)
 	{
-		if (LWLockConditionalAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED))
+		if (LWLockConditionalAcquire(&shared->locks[slotno].lock, LW_SHARED))
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
@@ -400,7 +397,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
 				shared->page_dirty[slotno] = true;
 			}
-			LWLockRelease(&shared->buffer_locks[slotno].lock);
+			LWLockRelease(&shared->locks[slotno].lock);
 		}
 	}
 }
@@ -433,6 +430,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	{
 		int			slotno;
 		int			partno;
+		int			banklockoffset;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -477,11 +475,12 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		shared->page_dirty[slotno] = false;
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
-		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->locks[slotno].lock, LW_EXCLUSIVE);
 		partno = slotno / ctl->part_size;
+		banklockoffset = shared->num_slots + partno;
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(&shared->part_locks[partno].lock);
+		LWLockRelease(&shared->locks[banklockoffset].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -490,7 +489,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->locks[banklockoffset].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -504,7 +503,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 		}
 
-		LWLockRelease(&shared->buffer_locks[slotno].lock);
+		LWLockRelease(&shared->locks[slotno].lock);
 
 		/* Now it's okay to ereport if we failed */
 		if (!ok)
@@ -539,12 +538,14 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	int			partno;
+	int			partlockoffset;
 
 	/* Determine partition number for the page. */
 	partno = SlruMappingPartNo(ctl, pageno);
+	partlockoffset = shared->num_slots + partno;
 
 	/* Try to find the page while holding only shared partition lock */
-	LWLockAcquire(&shared->part_locks[partno].lock, LW_SHARED);
+	LWLockAcquire(&shared->locks[partlockoffset].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	slotno = SlruMappingFind(ctl, pageno);
@@ -564,8 +565,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(&shared->part_locks[partno].lock);
-	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->locks[partlockoffset].lock);
+	LWLockAcquire(&shared->locks[partlockoffset].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -588,6 +589,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
 	int			partno = slotno / ctl->part_size;
+	int			partlockoffset = shared->num_slots + partno;
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -613,10 +615,10 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	shared->page_dirty[slotno] = false;
 
 	/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
-	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(&shared->part_locks[partno].lock);
+	LWLockRelease(&shared->locks[partlockoffset].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -631,7 +633,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(&shared->part_locks[partno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[partlockoffset].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -642,7 +644,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 
-	LWLockRelease(&shared->buffer_locks[slotno].lock);
+	LWLockRelease(&shared->locks[slotno].lock);
 
 	/* Now it's okay to ereport if we failed */
 	if (!ok)
@@ -1219,7 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
-	int			lastpartno = 0;
+	int			prevlockoffset = shared->num_slots;
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1230,17 +1232,17 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(&shared->part_locks[0].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			curpartno = slotno / ctl->part_size;
+		int			curlockoffset = shared->num_slots + slotno / ctl->part_size;
 
-		if (curpartno != lastpartno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->part_locks[lastpartno].lock);
-			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
-			lastpartno = curpartno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		SlruInternalWritePage(ctl, slotno, &fdata);
@@ -1256,7 +1258,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(&shared->part_locks[lastpartno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1296,7 +1298,8 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
-	int			prevpartno;
+	int			nslots = shared->num_slots;
+	int			prevlockoffset;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1322,21 +1325,21 @@ restart:
 		return;
 	}
 
-	prevpartno = 0;
-	LWLockAcquire(&shared->part_locks[prevpartno].lock, LW_EXCLUSIVE);
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	prevlockoffset = nslots;
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
+	for (slotno = 0; slotno < nslots; slotno++)
 	{
-		int			curpartno = slotno / ctl->part_size;
+		int			curlockoffset = nslots + (slotno / ctl->part_size);
 
 		/*
 		 * If the curpartno is not same as prevpartno then release the lock on
 		 * the prevpartno and acquire the lock on the curpartno.
 		 */
-		if (curpartno != prevpartno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->part_locks[prevpartno].lock);
-			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
-			prevpartno = curpartno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
@@ -1370,11 +1373,11 @@ restart:
 		else
 			SimpleLruWaitIO(ctl, slotno);
 
-		LWLockRelease(&shared->part_locks[prevpartno].lock);
+		LWLockRelease(&shared->locks[prevlockoffset].lock);
 		goto restart;
 	}
 
-	LWLockRelease(&shared->part_locks[prevpartno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1415,28 +1418,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
-	int			prevpartno = 0;
+	int			nslots = shared->num_slots;
+	int			prevlockoffset = nslots;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(&shared->part_locks[prevpartno].lock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->locks[prevlockoffset].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = 0; slotno < nslots; slotno++)
 	{
 		int			pagesegno;
-		int			curpartno;
+		int			curlockoffset;
 
-		curpartno = slotno / ctl->part_size;
+		curlockoffset = nslots + (slotno / ctl->part_size);
 
 		/*
 		 * If the curpartno is not same as prevpartno then release the lock on
 		 * the prevpartno and acquire the lock on the curpartno.
 		 */
-		if (curpartno != prevpartno)
+		if (curlockoffset != prevlockoffset)
 		{
-			LWLockRelease(&shared->part_locks[prevpartno].lock);
-			LWLockAcquire(&shared->part_locks[curpartno].lock, LW_EXCLUSIVE);
-			prevpartno = curpartno;
+			LWLockRelease(&shared->locks[prevlockoffset].lock);
+			LWLockAcquire(&shared->locks[curlockoffset].lock, LW_EXCLUSIVE);
+			prevlockoffset = curlockoffset;
 		}
 
 		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
@@ -1474,7 +1478,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(&shared->part_locks[prevpartno].lock);
+	LWLockRelease(&shared->locks[prevlockoffset].lock);
 }
 
 /*
@@ -1816,7 +1820,7 @@ SimpleLruGetPartitionLock(SlruCtl ctl, int pageno)
 {
 	int			partno = SlruMappingPartNo(ctl, pageno);
 
-	return &(ctl->shared->part_locks[partno].lock);
+	return &(ctl->shared->locks[ctl->shared->num_slots + partno].lock);
 }
 
 /*
@@ -1827,9 +1831,10 @@ SimpleLruLockAllPartitions(SlruCtl ctl, LWLockMode mode)
 {
 	SlruShared	shared = ctl->shared;
 	int			partno;
+	int			nslots = shared->num_slots;
 
 	for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
-		LWLockAcquire(&shared->part_locks[partno].lock, mode);
+		LWLockAcquire(&shared->locks[nslots + partno].lock, mode);
 }
 
 /*
@@ -1840,7 +1845,8 @@ SimpleLruUnLockAllPartitions(SlruCtl ctl)
 {
 	SlruShared	shared = ctl->shared;
 	int			partno;
+	int			nslots = shared->num_slots;
 
 	for (partno = 0; partno < SLRU_NUM_PARTITIONS; partno++)
-		LWLockRelease(&shared->part_locks[partno].lock);
+		LWLockRelease(&shared->locks[nslots + partno].lock);
 }
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index e6c54d5519..ac1227f29f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -70,14 +70,14 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
-	LWLockPadded *buffer_locks;
 
 	/*
-	 * Locks to protect the in memory buffer slot access in per SLRU bank. The
-	 * buffer_locks protects the I/O on each buffer slots whereas this lock
-	 * protect the in memory operation on the buffer within one SLRU bank.
+	 * This contains nslots numbers of buffers locks and nparts numbers of
+	 * part locks.  The buffer locks protects the I/O on each buffer slots
+	 * whereas the part lock protect the in memory operation on the buffer
+	 * within one SLRU part.
 	 */
-	LWLockPadded *part_locks;
+	LWLockPadded *locks;
 
 	/*----------
 	 * Instead of global counter we maintain a partition-wise lru counter
-- 
2.39.2 (Apple Git-143)

#13Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#11)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

changed the logic of SlruAdjustNSlots() in 0002, such that now it
starts with the next power of 2 value of the configured slots and
keeps doubling the number of banks until we reach the number of banks
to the max SLRU_MAX_BANKS(128) and bank size is bigger than
SLRU_MIN_BANK_SIZE (8). By doing so, we will ensure we don't have too
many banks

There was nothing wrong with having too many banks. Until bank-wise locks and counters were added in later patchsets.
Having hashtable to find SLRU page in the buffer IMV is too slow. Some comments on this approach can be found here [0]/messages/by-id/CA+hUKGKVqrxOp82zER1=XN=yPwV_-OCGAg=ez=1iz9rG+A7Smw@mail.gmail.com.
I'm OK with having HTAB for that if we are sure performance does not degrade significantly, but I really doubt this is the case.
I even think SLRU buffers used HTAB in some ancient times, but I could not find commit when it was changed to linear search.

Maybe we could decouple locks and counters from SLRU banks? Banks were meant to be small to exploit performance of local linear search. Lock partitions have to be bigger for sure.

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have taken 0001 and 0002 from [1], done some bug fixes in 0001

BTW can you please describe in more detail what kind of bugs?

Thanks for working on this!

Best regards, Andrey Borodin.

[0]: /messages/by-id/CA+hUKGKVqrxOp82zER1=XN=yPwV_-OCGAg=ez=1iz9rG+A7Smw@mail.gmail.com

#14Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#13)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Sun, Nov 5, 2023 at 1:37 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

changed the logic of SlruAdjustNSlots() in 0002, such that now it
starts with the next power of 2 value of the configured slots and
keeps doubling the number of banks until we reach the number of banks
to the max SLRU_MAX_BANKS(128) and bank size is bigger than
SLRU_MIN_BANK_SIZE (8). By doing so, we will ensure we don't have too
many banks

There was nothing wrong with having too many banks. Until bank-wise locks and counters were added in later patchsets.

I agree with that, but I feel with bank-wise locks we are removing
major contention from the centralized control lock and we can see that
from my first email that how much benefit we can get in one of the
simple test cases when we create subtransaction overflow.

Having hashtable to find SLRU page in the buffer IMV is too slow. Some comments on this approach can be found here [0].
I'm OK with having HTAB for that if we are sure performance does not degrade significantly, but I really doubt this is the case.
I even think SLRU buffers used HTAB in some ancient times, but I could not find commit when it was changed to linear search.

The main intention of having this buffer mapping hash is to find the
SLRU page faster than sequence search when banks are relatively bigger
in size, but if we find the cases where having hash creates more
overhead than providing gain then I am fine to remove the hash because
the whole purpose of adding hash here to make the lookup faster. So
far in my test I did not find the slowness. Do you or anyone else
have any test case based on the previous research on whether it
creates any slowness?

Maybe we could decouple locks and counters from SLRU banks? Banks were meant to be small to exploit performance of local linear search. Lock partitions have to be bigger for sure.

Yeah, that could also be an idea if we plan to drop the hash. I mean
bank-wise counter is fine as we are finding a victim buffer within a
bank itself, but each lock could cover more slots than one bank size
or in other words, it can protect multiple banks. Let's hear more
opinion on this.

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have taken 0001 and 0002 from [1], done some bug fixes in 0001

BTW can you please describe in more detail what kind of bugs?

Yeah, actually that patch was using the same GUC
(multixact_offsets_buffers) in SimpleLruInit for MultiXactOffsetCtl as
well as for MultiXactMemberCtl, see the below patch snippet from the
original patch.

@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;

  SimpleLruInit(MultiXactOffsetCtl,
-   "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+   "MultiXactOffset", multixact_offsets_buffers, 0,
    MultiXactOffsetSLRULock, "pg_multixact/offsets",
    LWTRANCHE_MULTIXACTOFFSET_BUFFER,
    SYNC_HANDLER_MULTIXACT_OFFSET);
  SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
  SimpleLruInit(MultiXactMemberCtl,
-   "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+   "MultiXactMember", multixact_offsets_buffers, 0,
    MultiXactMemberSLRULock, "pg_multixact/members",
    LWTRANCHE_MULTIXACTMEMBER_BUFFER,
    SYNC_HANDLER_MULTIXACT_MEMBER);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#15Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#14)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 6 Nov 2023, at 09:09, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Having hashtable to find SLRU page in the buffer IMV is too slow. Some comments on this approach can be found here [0].
I'm OK with having HTAB for that if we are sure performance does not degrade significantly, but I really doubt this is the case.
I even think SLRU buffers used HTAB in some ancient times, but I could not find commit when it was changed to linear search.

The main intention of having this buffer mapping hash is to find the
SLRU page faster than sequence search when banks are relatively bigger
in size, but if we find the cases where having hash creates more
overhead than providing gain then I am fine to remove the hash because
the whole purpose of adding hash here to make the lookup faster. So
far in my test I did not find the slowness. Do you or anyone else
have any test case based on the previous research on whether it
creates any slowness?

PFA test benchmark_slru_page_readonly(). In this test we run SimpleLruReadPage_ReadOnly() (essential part of TransactionIdGetStatus())
before introducing HTAB for buffer mapping I get
Time: 14837.851 ms (00:14.838)
with buffer HTAB I get
Time: 22723.243 ms (00:22.723)

This hash table makes getting transaction status ~50% slower.

Benchmark script I used:
make -C $HOME/postgresMX -j 8 install && (pkill -9 postgres; rm -rf test; ./initdb test && echo "shared_preload_libraries = 'test_slru'">> test/postgresql.conf && ./pg_ctl -D test start && ./psql -c 'create extension test_slru' postgres && ./pg_ctl -D test restart && ./psql -c "SELECT count(test_slru_page_write(a, 'Test SLRU'))
FROM generate_series(12346, 12393, 1) as a;" -c '\timing' -c "SELECT benchmark_slru_page_readonly(12377);" postgres)

Maybe we could decouple locks and counters from SLRU banks? Banks were meant to be small to exploit performance of local linear search. Lock partitions have to be bigger for sure.

Yeah, that could also be an idea if we plan to drop the hash. I mean
bank-wise counter is fine as we are finding a victim buffer within a
bank itself, but each lock could cover more slots than one bank size
or in other words, it can protect multiple banks. Let's hear more
opinion on this.

+1

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have taken 0001 and 0002 from [1], done some bug fixes in 0001

BTW can you please describe in more detail what kind of bugs?

Yeah, actually that patch was using the same GUC
(multixact_offsets_buffers) in SimpleLruInit for MultiXactOffsetCtl as
well as for MultiXactMemberCtl, see the below patch snippet from the
original patch.

Ouch. We were running this for serveral years with this bug... Thanks!

Best regards, Andrey Borodin.

Attachments:

0001-Implement-benchmark_slru_page_readonly-to-assess-SLR.patchapplication/octet-stream; name=0001-Implement-benchmark_slru_page_readonly-to-assess-SLR.patch; x-unix-mode=0644Download
From 4888ae7664224c5a63e2edb598e658afe0e19f87 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@172.25.72.30-ekb.dhcp.yndx.net>
Date: Mon, 6 Nov 2023 11:55:38 +0500
Subject: [PATCH] Implement benchmark_slru_page_readonly() to assess SLRU
 perfromance

---
 src/test/modules/test_slru/test_slru--1.0.sql |  2 ++
 src/test/modules/test_slru/test_slru.c        | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/src/test/modules/test_slru/test_slru--1.0.sql b/src/test/modules/test_slru/test_slru--1.0.sql
index 8635e7df01..3db6ef1029 100644
--- a/src/test/modules/test_slru/test_slru--1.0.sql
+++ b/src/test/modules/test_slru/test_slru--1.0.sql
@@ -11,6 +11,8 @@ CREATE OR REPLACE FUNCTION test_slru_page_read(int, bool DEFAULT true) RETURNS t
   AS 'MODULE_PATHNAME', 'test_slru_page_read' LANGUAGE C;
 CREATE OR REPLACE FUNCTION test_slru_page_readonly(int) RETURNS text
   AS 'MODULE_PATHNAME', 'test_slru_page_readonly' LANGUAGE C;
+CREATE OR REPLACE FUNCTION benchmark_slru_page_readonly(int) RETURNS void
+  AS 'MODULE_PATHNAME', 'benchmark_slru_page_readonly' LANGUAGE C;
 CREATE OR REPLACE FUNCTION test_slru_page_exists(int) RETURNS bool
   AS 'MODULE_PATHNAME', 'test_slru_page_exists' LANGUAGE C;
 CREATE OR REPLACE FUNCTION test_slru_page_delete(int) RETURNS VOID
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..8a1e67a910 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -31,6 +31,7 @@ PG_FUNCTION_INFO_V1(test_slru_page_write);
 PG_FUNCTION_INFO_V1(test_slru_page_writeall);
 PG_FUNCTION_INFO_V1(test_slru_page_read);
 PG_FUNCTION_INFO_V1(test_slru_page_readonly);
+PG_FUNCTION_INFO_V1(benchmark_slru_page_readonly);
 PG_FUNCTION_INFO_V1(test_slru_page_exists);
 PG_FUNCTION_INFO_V1(test_slru_page_sync);
 PG_FUNCTION_INFO_V1(test_slru_page_delete);
@@ -128,6 +129,23 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
 
+Datum
+benchmark_slru_page_readonly(PG_FUNCTION_ARGS)
+{
+	int			pageno = PG_GETARG_INT32(0);
+
+	for (int i = 0; i < 1000000000; i++)
+	{
+		SimpleLruReadPage_ReadOnly(TestSlruCtl,
+										pageno,
+										InvalidTransactionId);
+		Assert(LWLockHeldByMe(TestSLRULock));
+		LWLockRelease(TestSLRULock);
+	}
+
+	PG_RETURN_VOID();
+}
+
 Datum
 test_slru_page_exists(PG_FUNCTION_ARGS)
 {
-- 
2.37.1 (Apple Git-137.1)

#16Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#15)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 6, 2023 at 1:05 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 6 Nov 2023, at 09:09, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Having hashtable to find SLRU page in the buffer IMV is too slow. Some comments on this approach can be found here [0].
I'm OK with having HTAB for that if we are sure performance does not degrade significantly, but I really doubt this is the case.
I even think SLRU buffers used HTAB in some ancient times, but I could not find commit when it was changed to linear search.

The main intention of having this buffer mapping hash is to find the
SLRU page faster than sequence search when banks are relatively bigger
in size, but if we find the cases where having hash creates more
overhead than providing gain then I am fine to remove the hash because
the whole purpose of adding hash here to make the lookup faster. So
far in my test I did not find the slowness. Do you or anyone else
have any test case based on the previous research on whether it
creates any slowness?

PFA test benchmark_slru_page_readonly(). In this test we run SimpleLruReadPage_ReadOnly() (essential part of TransactionIdGetStatus())
before introducing HTAB for buffer mapping I get
Time: 14837.851 ms (00:14.838)
with buffer HTAB I get
Time: 22723.243 ms (00:22.723)

This hash table makes getting transaction status ~50% slower.

Benchmark script I used:
make -C $HOME/postgresMX -j 8 install && (pkill -9 postgres; rm -rf test; ./initdb test && echo "shared_preload_libraries = 'test_slru'">> test/postgresql.conf && ./pg_ctl -D test start && ./psql -c 'create extension test_slru' postgres && ./pg_ctl -D test restart && ./psql -c "SELECT count(test_slru_page_write(a, 'Test SLRU'))
FROM generate_series(12346, 12393, 1) as a;" -c '\timing' -c "SELECT benchmark_slru_page_readonly(12377);" postgres)

With this test, I got below numbers,

nslots. no-hash hash
8 10s 13s
16 10s 13s
32 15s 13s
64 17s 13s

Yeah so we can see with a small bank size <=16 slots we are seeing
that the fetching page with hash is 30% slower than the sequential
search, but beyond 32 slots sequential search is become slower as you
grow the number of slots whereas with hash it stays constant as
expected. But now as you told if keep the lock partition range
different than the bank size then we might not have any problem by
having more numbers of banks and with that, we can keep the bank size
small like 16. Let me put some more thought into this and get back.
Any other opinions on this?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#17Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#16)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2023-Nov-06, Dilip Kumar wrote:

Yeah so we can see with a small bank size <=16 slots we are seeing
that the fetching page with hash is 30% slower than the sequential
search, but beyond 32 slots sequential search is become slower as you
grow the number of slots whereas with hash it stays constant as
expected. But now as you told if keep the lock partition range
different than the bank size then we might not have any problem by
having more numbers of banks and with that, we can keep the bank size
small like 16. Let me put some more thought into this and get back.
Any other opinions on this?

dynahash is notoriously slow, which is why we have simplehash.h since
commit b30d3ea824c5. Maybe we could use that instead.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)

#18Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#17)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 6 Nov 2023, at 14:31, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

dynahash is notoriously slow, which is why we have simplehash.h since
commit b30d3ea824c5. Maybe we could use that instead.

Dynahash has lock partitioning. Simplehash has not, AFAIK.
The thing is we do not really need a hash function - pageno is already a best hash function itself. And we do not need to cope with collisions much - we can evict a collided buffer.

Given this we do not need a hashtable at all. That’s exact reasoning how banks emerged, I started implementing dynahsh patch in April 2021 and found out that “banks” approach is cleaner. However the term “bank” is not common in software, it’s taken from hardware cache.

Best regards, Andrey Borodin.

#19Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#18)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 6, 2023 at 4:44 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 6 Nov 2023, at 14:31, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

dynahash is notoriously slow, which is why we have simplehash.h since
commit b30d3ea824c5. Maybe we could use that instead.

Dynahash has lock partitioning. Simplehash has not, AFAIK.

Yeah, Simplehash doesn't have partitioning so with simple hash we will
be stuck with the centralized control lock that is one of the main
problems trying to solve here.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#20Amul Sul
sulamul@gmail.com
In reply to: Andrey M. Borodin (#18)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 6, 2023 at 4:44 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

On 6 Nov 2023, at 14:31, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

dynahash is notoriously slow, which is why we have simplehash.h since
commit b30d3ea824c5. Maybe we could use that instead.

Dynahash has lock partitioning. Simplehash has not, AFAIK.
The thing is we do not really need a hash function - pageno is already a
best hash function itself. And we do not need to cope with collisions much
- we can evict a collided buffer.

Given this we do not need a hashtable at all. That’s exact reasoning how
banks emerged, I started implementing dynahsh patch in April 2021 and found
out that “banks” approach is cleaner. However the term “bank” is not common
in software, it’s taken from hardware cache.

I agree that we don't need the hash function to generate hash value out of
pageno which itself is sufficient, but I don't understand how we can get
rid of
the hash table itself -- how we would map the pageno and the slot number?
That mapping is not needed at all?

Regards,
Amul

#21Amul Sul
sulamul@gmail.com
In reply to: Dilip Kumar (#12)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 3, 2023 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Oct 30, 2023 at 11:50 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:

[...]

[1] 0001-Make-all-SLRU-buffer-sizes-configurable: This is the same
patch as the previous patch set
[2] 0002-Add-a-buffer-mapping-table-for-SLRUs: Patch to introduce
buffer mapping hash table
[3] 0003-Partition-wise-slru-locks: Partition the hash table and also
introduce partition-wise locks: this is a merge of 0003 and 0004 from
the previous patch set but instead of bank-wise locks it has
partition-wise locks and LRU counter.
[4] 0004-Merge-partition-locks-array-with-buffer-locks-array: merging
buffer locks and bank locks in the same array so that the bank-wise
LRU counter does not fetch the next cache line in a hot function
SlruRecentlyUsed()(same as 0005 from the previous patch set)
[5] 0005-Ensure-slru-buffer-slots-are-in-multiple-of-number-of: Ensure
that the number of slots is in multiple of the number of banks
[...]

Here are some minor comments:

+ * By default, we'll use 1MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-   return Min(256, Max(4, NBuffers / 256));
+   /* Use configured value if provided. */
+   if (commit_ts_buffers > 0)
+       return Max(16, commit_ts_buffers);
+   return Min(256, Max(16, NBuffers / 256));

Do you mean "4MB of for every 1GB" in the comment?
--

diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..78d017ad85 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -16,7 +16,6 @@
 #include "replication/origin.h"
 #include "storage/sync.h"

-
extern PGDLLIMPORT bool track_commit_timestamp;

A spurious change.
--

@@ -168,10 +180,19 @@ SimpleLruShmemSize(int nslots, int nlsns)

if (nlsns > 0)
sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr)); /*
group_lsn[] */
-
return BUFFERALIGN(sz) + BLCKSZ * nslots;
}

Another spurious change in 0002 patch.
--

+/*
+ * The slru buffer mapping table is partitioned to reduce contention. To
+ * determine which partition lock a given pageno requires, compute the
pageno's
+ * hash code with SlruBufTableHashCode(), then apply SlruPartitionLock().
+ */

I didn't see SlruBufTableHashCode() & SlruPartitionLock() functions
anywhere in
your patches, is that outdated comment?
--

-   sz += MAXALIGN(nslots * sizeof(LWLockPadded));  /* buffer_locks[] */
-   sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded)); /*
part_locks[] */
+   sz += MAXALIGN((nslots + SLRU_NUM_PARTITIONS) * sizeof(LWLockPadded));
 /* locks[] */

I am a bit uncomfortable with these changes, merging parts and buffer locks
making it hard to understand the code. Not sure what we were getting out of
this?
--

Subject: [PATCH v4 5/5] Ensure slru buffer slots are in multiple of numbe of
partitions

I think the 0005 patch can be merged to 0001.

Regards,
Amul

#22Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amul Sul (#21)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Nov 8, 2023 at 10:52 AM Amul Sul <sulamul@gmail.com> wrote:

Thanks for review Amul,

Here are some minor comments:

+ * By default, we'll use 1MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
*/
Size
CommitTsShmemBuffers(void)
{
-   return Min(256, Max(4, NBuffers / 256));
+   /* Use configured value if provided. */
+   if (commit_ts_buffers > 0)
+       return Max(16, commit_ts_buffers);
+   return Min(256, Max(16, NBuffers / 256));

Do you mean "4MB of for every 1GB" in the comment?

You are right

--

diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 5087cdce51..78d017ad85 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -16,7 +16,6 @@
#include "replication/origin.h"
#include "storage/sync.h"

-
extern PGDLLIMPORT bool track_commit_timestamp;

A spurious change.

Will fix

--

@@ -168,10 +180,19 @@ SimpleLruShmemSize(int nslots, int nlsns)

if (nlsns > 0)
sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr)); /* group_lsn[] */
-
return BUFFERALIGN(sz) + BLCKSZ * nslots;
}

Another spurious change in 0002 patch.

Will fix

--

+/*
+ * The slru buffer mapping table is partitioned to reduce contention. To
+ * determine which partition lock a given pageno requires, compute the pageno's
+ * hash code with SlruBufTableHashCode(), then apply SlruPartitionLock().
+ */

I didn't see SlruBufTableHashCode() & SlruPartitionLock() functions anywhere in
your patches, is that outdated comment?

Yes will fix it, actually, there are some major design changes to this.

--

-   sz += MAXALIGN(nslots * sizeof(LWLockPadded));  /* buffer_locks[] */
-   sz += MAXALIGN(SLRU_NUM_PARTITIONS * sizeof(LWLockPadded)); /* part_locks[] */
+   sz += MAXALIGN((nslots + SLRU_NUM_PARTITIONS) * sizeof(LWLockPadded));  /* locks[] */

I am a bit uncomfortable with these changes, merging parts and buffer locks
making it hard to understand the code. Not sure what we were getting out of
this?

Yes, even I do not like this much because it is confusing. But the
advantage of this is that we are using a single pointer for the lock
which means the next variable for the LRU counter will come in the
same cacheline and frequent updates of lru counter will be benefitted
from this. Although I don't have any number which proves this.
Currently, I want to focus on all the base patches and keep this patch
as add on and later if we find its useful and want to pursue this then
we will see how to make it better readable.

Subject: [PATCH v4 5/5] Ensure slru buffer slots are in multiple of numbe of
partitions

I think the 0005 patch can be merged to 0001.

Yeah in the next version, it is done that way. Planning to post end of the day.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#23Ants Aasma
ants@cybertec.at
In reply to: Andrey M. Borodin (#13)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Sat, 4 Nov 2023 at 22:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 30 Oct 2023, at 09:20, Dilip Kumar <dilipbalaut@gmail.com> wrote:

changed the logic of SlruAdjustNSlots() in 0002, such that now it
starts with the next power of 2 value of the configured slots and
keeps doubling the number of banks until we reach the number of banks
to the max SLRU_MAX_BANKS(128) and bank size is bigger than
SLRU_MIN_BANK_SIZE (8). By doing so, we will ensure we don't have too
many banks

There was nothing wrong with having too many banks. Until bank-wise locks
and counters were added in later patchsets.
Having hashtable to find SLRU page in the buffer IMV is too slow. Some
comments on this approach can be found here [0].
I'm OK with having HTAB for that if we are sure performance does not
degrade significantly, but I really doubt this is the case.
I even think SLRU buffers used HTAB in some ancient times, but I could not
find commit when it was changed to linear search.

Maybe we could decouple locks and counters from SLRU banks? Banks were
meant to be small to exploit performance of local linear search. Lock
partitions have to be bigger for sure.

Is there a particular reason why lock partitions need to be bigger? We have
one lock per buffer anyway, bankwise locks will increase the number of
locks < 10%.

I am working on trying out a SIMD based LRU mechanism that uses a 16 entry
bank. The data layout is:

struct CacheBank {
int page_numbers[16];
char access_age[16];
}

The first part uses up one cache line, and the second line has 48 bytes of
space left over that could fit a lwlock and page_status, page_dirty arrays.

Lookup + LRU maintenance has 20 instructions/14 cycle latency and the only
branch is for found/not found. Hoping to have a working prototype of SLRU
on top in the next couple of days.

Regards,
Ants Aasma

#24Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Ants Aasma (#23)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 8 Nov 2023, at 14:17, Ants Aasma <ants@cybertec.at> wrote:

Is there a particular reason why lock partitions need to be bigger? We have one lock per buffer anyway, bankwise locks will increase the number of locks < 10%.

The problem was not attracting much attention for some years. So my reasoning was that solution should not have any costs at all. Initial patchset with banks did not add any memory footprint.

On 8 Nov 2023, at 14:17, Ants Aasma <ants@cybertec.at> wrote:

I am working on trying out a SIMD based LRU mechanism that uses a 16 entry bank.

FWIW I tried to pack struct parts together to minimize cache lines touched, see step 3 in [0]/messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru. So far I could not prove any performance benefits of this approach. But maybe your implementation will be more efficient.

Thanks!

Best regards, Andrey Borodin.

[0]: /messages/by-id/93236D36-B91C-4DFA-AF03-99C083840378@yandex-team.ru

#25Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#14)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 6, 2023 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Nov 5, 2023 at 1:37 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Maybe we could decouple locks and counters from SLRU banks? Banks were meant to be small to exploit performance of local linear search. Lock partitions have to be bigger for sure.

Yeah, that could also be an idea if we plan to drop the hash. I mean
bank-wise counter is fine as we are finding a victim buffer within a
bank itself, but each lock could cover more slots than one bank size
or in other words, it can protect multiple banks. Let's hear more
opinion on this.

Here is the updated version of the patch, here I have taken the
approach suggested by Andrey and I discussed the same with Alvaro
offlist and he also agrees with it. So the idea is that we will keep
the bank size fixed which is 16 buffers per bank and the allowed GUC
value for each slru buffer must be in multiple of the bank size. We
have removed the centralized lock but instead of one lock per bank, we
have kept the maximum limit on the number of bank locks which is 128.
We kept the max limit as 128 because, in one of the operations (i.e.
ActivateCommitTs), we need to acquire all the bank locks (but this is
not a performance path at all) and at a time we can acquire a max of
200 LWlocks, so we think this limit of 128 is good. So now if the
number of banks is <= 128 then we will be using one lock per bank
otherwise the one lock may protect access of buffer in multiple banks.

We might argue that we should keep the max lock lesser than 128 i.e.
64 or 32 and I am open to that we can do more experiments with a very
large buffer pool and a very heavy workload to see whether having lock
up to 128 is helpful or not

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v5-0002-Divide-SLRU-buffers-into-banks.patchDownload
From ca083bb571a927202200f94909bcb280a39055ed Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 16:51:34 +0530
Subject: [PATCH v5 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++++
 src/backend/access/transam/multixact.c | 19 ++++++++++++++
 src/backend/access/transam/slru.c      | 34 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++++
 src/backend/commands/async.c           | 10 ++++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++++
 src/backend/utils/misc/guc_tables.c    | 14 +++++------
 src/include/access/slru.h              | 12 ++++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++++
 10 files changed, 128 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7979bbd00f..ab3893cf4f 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1019,3 +1020,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9ba5ae6534..96810959ab 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1017,3 +1018,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..77511c6342 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3419,3 +3420,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..8697a27555 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -134,7 +135,6 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -258,7 +258,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +501,14 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1031,7 +1037,10 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
+
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1075,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1613,3 +1622,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..923e706535 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -373,3 +374,12 @@ SubTransPagePrecedes(int page1, int page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..98449cbdde 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -149,6 +149,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2444,3 +2445,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..e4903c67ec 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5011,3 +5012,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c82635943b..7c85d2126e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, CLOG_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..51c5762b9f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -139,6 +144,11 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks
+	 */
+	Size		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -175,5 +185,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 8597e430de..7dd96a2059 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -128,6 +128,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v5-0003-Remove-the-centralized-control-lock-and-LRU-count.patchapplication/octet-stream; name=v5-0003-Remove-the-centralized-control-lock-and-LRU-count.patchDownload
From 263f0bb133d8214bced70ba9f0df0b2981974bdf Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Tue, 7 Nov 2023 09:51:37 +0530
Subject: [PATCH v5 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Andrey M. Borodin,
Robert Haas, and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 114 +++++++----
 src/backend/access/transam/commit_ts.c   |  43 ++--
 src/backend/access/transam/multixact.c   | 177 ++++++++++++-----
 src/backend/access/transam/slru.c        | 238 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  33 ++--
 src/include/access/slru.h                |  63 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 +--
 12 files changed, 589 insertions(+), 247 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index ab3893cf4f..7b546cab3c 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -275,14 +275,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock w.r.t. the page we are going to access. */
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the SLRU lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -301,17 +306,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -324,10 +329,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -346,7 +351,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetSLRUBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -397,14 +403,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -418,6 +423,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -498,13 +505,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -512,10 +516,38 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/* Acquire the SLRU bank lock w.r.t. the first page in the group. */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetSLRUBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * Although we are trying our best to keep same page in a group, there
+		 * are cases where we might get different pages as well for detail
+		 * refer comment in above while loop where we are adding this process
+		 * for group update.  So if the current page we are going to access is
+		 * not in the same slru bank in which we updated the last page then we
+		 * need to release the lock on the previous bank and acquire lock on
+		 * the bank w.r.t. the page we are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -535,7 +567,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -564,10 +597,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -656,7 +690,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -690,8 +724,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -705,8 +739,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -715,7 +750,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -750,14 +785,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -768,8 +799,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -801,7 +833,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -833,6 +865,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -843,13 +876,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -987,16 +1021,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 96810959ab..ae1badd295 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -219,8 +219,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -230,13 +231,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -337,7 +338,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -527,9 +528,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -685,9 +685,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -714,12 +712,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -768,9 +767,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -802,6 +801,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -822,12 +822,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -981,16 +983,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1002,7 +1006,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 77511c6342..ad31b2017b 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,23 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If the page is on the different SLRU bank then release the lock on the
+	 * previous bank if we are already holding one and acquire the lock on the
+	 * new bank.
+	 */
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1410,22 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * SLRU pageno is changed so check whether this page is falling in
+			 * the different slru bank than on which we are already holding
+			 * the lock and if so release the lock on the old bank and acquire
+			 * that on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1434,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1444,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1464,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1501,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,14 +1912,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1895,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1905,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1916,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1976,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1994,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2016,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2047,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2068,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2076,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2099,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2114,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2405,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2415,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2454,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2760,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3242,31 +3314,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3332,7 +3406,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 8697a27555..dd1a4f13b2 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -72,6 +72,21 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -93,34 +108,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -147,6 +134,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 /*
  * Initialization of shared memory
@@ -156,6 +144,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -165,6 +155,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -181,16 +173,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -202,18 +197,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -234,6 +227,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -245,7 +242,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -254,6 +251,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -307,7 +313,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -346,12 +352,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -406,6 +413,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -448,9 +456,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -459,7 +468,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -503,9 +512,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (slotno = bankstart; slotno < bankend; slotno++)
@@ -525,8 +535,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -548,6 +558,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -576,7 +587,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -591,7 +602,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1037,7 +1048,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		for (slotno = bankstart; slotno < bankend; slotno++)
@@ -1074,7 +1086,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1096,7 +1108,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1170,6 +1182,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1180,10 +1193,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the curlockno is not same as prevlockno then release the previous
+		 * lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1197,7 +1223,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1237,6 +1263,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1247,25 +1274,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the curlockno is not same as prevlockno then release the previous
+		 * lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1295,10 +1335,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1339,15 +1381,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the curlockno is not same as prevlockno then release the previous
+		 * lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1381,7 +1437,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1623,6 +1679,38 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".  Note multiple
+ * evaluation of arguments!
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this macro.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1639,3 +1727,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 923e706535..ff47985f08 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -78,12 +78,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -101,7 +103,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -131,7 +133,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -194,8 +196,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -213,8 +216,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -223,7 +227,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -253,6 +257,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -260,23 +266,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -310,6 +340,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -321,12 +352,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 98449cbdde..67da0b48bd 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -268,9 +268,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -571,7 +572,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1403,7 +1404,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1413,9 +1414,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1439,6 +1438,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1484,6 +1488,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1510,7 +1525,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1989,9 +2004,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2011,7 +2026,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetSLRUBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2052,7 +2067,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2205,7 +2220,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e4903c67ec..7632c42978 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,8 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -847,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetSLRUBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 51c5762b9f..d9be57de75 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -21,6 +21,7 @@
  * SLRU bank size for slotno hash banks
  */
 #define SLRU_BANK_SIZE		16
+#define	SLRU_MAX_BANKLOCKS	128
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -62,8 +63,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -76,36 +75,52 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
 	/*
-	 * Optional array of WAL flush LSNs associated with entries in the SLRU
-	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
-	 * highest LSN known for a contiguous group of SLRU entries on that slot's
-	 * page.
+	 * Locks to protect the in memory buffer slot access in SLRU bank.  If the
+	 * number of banks are <= SLRU_MAX_BANKLOCKS then there will be one lock
+	 * per bank otherwise each lock will protect multiple banks depends upon
+	 * the number of banks.
 	 */
-	XLogRecPtr *group_lsn;
-	int			lsn_groups_per_page;
+	LWLockPadded *bank_locks;
 
 	/*----------
+	 * Instead of global counter we maintain a bank-wise lru counter because
+	 * a) we are doing the victim buffer selection as bank level so there is
+	 * no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
 	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
 	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
+	 *		bank_cur_lru_count[bankno] - page_lru_count[slotno]
 	 * The counts will eventually wrap around, but this calculation still
 	 * works as long as no page's age exceeds INT_MAX counts.
 	 *----------
 	 */
-	int			cur_lru_count;
+	int		   *bank_cur_lru_count;
+
+	/*
+	 * Optional array of WAL flush LSNs associated with entries in the SLRU
+	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
+	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
+	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * highest LSN known for a contiguous group of SLRU entries on that slot's
+	 * page.
+	 */
+	XLogRecPtr *group_lsn;
+	int			lsn_groups_per_page;
 
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -153,11 +168,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno)
+{
+	int			banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -185,5 +213,8 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
+extern LWLock *SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9a02f33933 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

v5-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v5-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 8d2e41ed3d7b105cb224608b75e2cc4a2568b266 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v5 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  23 ++-
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/clog.h                     |  10 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 19 files changed, 305 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd70ff2e4b..654db076b1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the cotents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 1024, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..7979bbd00f 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -58,8 +58,8 @@
 
 /* We need two bits per xact, so four xacts fit in a byte */
 #define CLOG_BITS_PER_XACT	2
-#define CLOG_XACTS_PER_BYTE 4
-#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+StaticAssertDecl((CLOG_BITS_PER_XACT * CLOG_XACTS_PER_BYTE) == BITS_PER_BYTE,
+				 "CLOG_BITS_PER_XACT and CLOG_XACTS_PER_BYTE are inconsistent");
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
 #define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * theoretical maximum useful value, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(CLOG_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..9ba5ae6534 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,11 +493,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..c68d668514 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7605eff9b9..c82635943b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, CLOG_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e48c066a5b..364553a314 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index d99444f073..a9cd65db36 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -15,6 +15,16 @@
 #include "storage/sync.h"
 #include "lib/stringinfo.h"
 
+/*
+ * Don't allow xact_buffers to be set higher than could possibly be useful or
+ * SLRU would allow.
+ */
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 2a191830a8..8597e430de 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -161,4 +161,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

#26Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Ants Aasma (#23)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

IMO the whole area of SLRU buffering is in horrible shape and many users
are struggling with overall PG performance because of it. An
improvement doesn't have to be perfect -- it just has to be much better
than the current situation, which should be easy enough. We can
continue to improve later, using more scalable algorithms or ones that
allow us to raise the limits higher.

The only point on which we do not have full consensus yet is the need to
have one GUC per SLRU, and a lot of effort seems focused on trying to
fix the problem without adding so many GUCs (for example, using shared
buffers instead, or use a single "scaling" GUC). I think that hinders
progress. Let's just add multiple GUCs, and users can leave most of
them alone and only adjust the one with which they have a performance
problems; it's not going to be the same one for everybody.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

#27Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#25)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Nov 8, 2023 at 6:41 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Here is the updated version of the patch, here I have taken the
approach suggested by Andrey and I discussed the same with Alvaro
offlist and he also agrees with it. So the idea is that we will keep
the bank size fixed which is 16 buffers per bank and the allowed GUC
value for each slru buffer must be in multiple of the bank size. We
have removed the centralized lock but instead of one lock per bank, we
have kept the maximum limit on the number of bank locks which is 128.
We kept the max limit as 128 because, in one of the operations (i.e.
ActivateCommitTs), we need to acquire all the bank locks (but this is
not a performance path at all) and at a time we can acquire a max of
200 LWlocks, so we think this limit of 128 is good. So now if the
number of banks is <= 128 then we will be using one lock per bank
otherwise the one lock may protect access of buffer in multiple banks.

Just so I understand, I guess this means that an SLRU is limited to
16*128 = 2k buffers = 16MB?

When we were talking about this earlier, I suggested fixing the number
of banks and allowing the number of buffers per bank to scale
depending on the setting. That seemed simpler than allowing both the
number of banks and the number of buffers to vary, and it might allow
the compiler to optimize some code better, by converting a calculation
like page_no%number_of_banks into a masking operation like page_no&0xf
or whatever. However, because it allows an individual bank to become
arbitrarily large, it more or less requires us to use a buffer mapping
table. Some of the performance problems mentioned could be alleviated
by omitting the hash table when the number of buffers per bank is
small, and we could also create the dynahash with a custom hash
function that just does modular arithmetic on the page number rather
than a real hashing operation. However, maybe we don't really need to
do any of that. I agree that dynahash is clunky on a good day. I
hadn't realized the impact would be so noticeable.

This proposal takes the opposite approach of fixing the number of
buffers per bank, letting the number of banks vary. I think that's
probably fine, although it does reduce the effective associativity of
the cache. If there are more hot buffers in a bank than the bank size,
the bank will be contended, even if other banks are cold. However,
given the way SLRUs are accessed, it seems hard to imagine this being
a real problem in practice. There aren't likely to be say 20 hot
buffers that just so happen to all be separated from one another by a
number of pages that is a multiple of the configured number of banks.
And in the seemingly very unlikely event that you have a workload that
behaves like that, you could always adjust the number of banks up or
down by one, and the problem would go away. So this seems OK to me.

I also agree with a couple of points that Alvaro made, specifically
that (1) this doesn't have to be perfect, just better than now and (2)
separate GUCs for each SLRU is fine. On the latter point, it's worth
keeping in mind that the cost of a GUC that most people don't need to
tune is fairly low. GUCs like work_mem and shared_buffers are
"expensive" because everybody more or less needs to understand what
they are and how to set them and getting the right value can tricky --
but a GUC like autovacuum_naptime is a lot cheaper, because almost
nobody needs to change it. It seems to me that these GUCs will fall
into the latter category. Users can hopefully just ignore them except
if they see a contention on the SLRU bank locks -- and then they can
consider increasing the number of banks for that particular SLRU. That
seems simple enough. As with autovacuum_naptime, there is a danger
that people will configure a ridiculous value of the parameter for no
good reason and get bad results, so it would be nice if someday we had
a magical system that just got all of this right without the user
needing to configure anything. But in the meantime, it's better to
have a somewhat manual system to relieve pressure on these locks than
no system at all.

--
Robert Haas
EDB: http://www.enterprisedb.com

#28Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#27)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Nov 9, 2023 at 9:39 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 8, 2023 at 6:41 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Here is the updated version of the patch, here I have taken the
approach suggested by Andrey and I discussed the same with Alvaro
offlist and he also agrees with it. So the idea is that we will keep
the bank size fixed which is 16 buffers per bank and the allowed GUC
value for each slru buffer must be in multiple of the bank size. We
have removed the centralized lock but instead of one lock per bank, we
have kept the maximum limit on the number of bank locks which is 128.
We kept the max limit as 128 because, in one of the operations (i.e.
ActivateCommitTs), we need to acquire all the bank locks (but this is
not a performance path at all) and at a time we can acquire a max of
200 LWlocks, so we think this limit of 128 is good. So now if the
number of banks is <= 128 then we will be using one lock per bank
otherwise the one lock may protect access of buffer in multiple banks.

Just so I understand, I guess this means that an SLRU is limited to
16*128 = 2k buffers = 16MB?

Not really, because 128 is the maximum limit on the number of bank
locks not on the number of banks. So for example, if you have 16*128
= 2k buffers then each lock will protect one bank, and likewise when
you have 16 * 512 = 8k buffers then each lock will protect 4 banks.
So in short we can get the lock for each bank by simple computation
(banklockno = bankno % 128 )

When we were talking about this earlier, I suggested fixing the number
of banks and allowing the number of buffers per bank to scale
depending on the setting. That seemed simpler than allowing both the
number of banks and the number of buffers to vary, and it might allow
the compiler to optimize some code better, by converting a calculation
like page_no%number_of_banks into a masking operation like page_no&0xf
or whatever. However, because it allows an individual bank to become
arbitrarily large, it more or less requires us to use a buffer mapping
table. Some of the performance problems mentioned could be alleviated
by omitting the hash table when the number of buffers per bank is
small, and we could also create the dynahash with a custom hash
function that just does modular arithmetic on the page number rather
than a real hashing operation. However, maybe we don't really need to
do any of that. I agree that dynahash is clunky on a good day. I
hadn't realized the impact would be so noticeable.

Yes, so one idea is that we keep the number of banks fixed and with
that, as you pointed out correctly with a large number of buffers, the
bank size can be quite big and for that, we would need a hash table
and OTOH what I am doing here is keeping the bank size fixed and
smaller (16 buffers per bank) and with that we can have large numbers
of the bank when the buffer pool size is quite big. But I feel having
more banks is not really a problem if we grow the number of locks
beyond a certain limit as in some corner cases we need to acquire all
locks together and there is a limit on that. So I like this idea of
sharing locks across the banks with that 1) We can have enough locks
so that lock contention or cache invalidation due to a common lock
should not be a problem anymore 2) We can keep a small bank size with
that seq search within the bank is quite fast so reads are fast 3)
With small bank size victim buffer search which has to be sequential
is quite fast.

This proposal takes the opposite approach of fixing the number of
buffers per bank, letting the number of banks vary. I think that's
probably fine, although it does reduce the effective associativity of
the cache. If there are more hot buffers in a bank than the bank size,
the bank will be contended, even if other banks are cold. However,
given the way SLRUs are accessed, it seems hard to imagine this being
a real problem in practice. There aren't likely to be say 20 hot
buffers that just so happen to all be separated from one another by a
number of pages that is a multiple of the configured number of banks.
And in the seemingly very unlikely event that you have a workload that
behaves like that, you could always adjust the number of banks up or
down by one, and the problem would go away. So this seems OK to me.

I agree with this

I also agree with a couple of points that Alvaro made, specifically
that (1) this doesn't have to be perfect, just better than now and (2)
separate GUCs for each SLRU is fine. On the latter point, it's worth
keeping in mind that the cost of a GUC that most people don't need to
tune is fairly low. GUCs like work_mem and shared_buffers are
"expensive" because everybody more or less needs to understand what
they are and how to set them and getting the right value can tricky --
but a GUC like autovacuum_naptime is a lot cheaper, because almost
nobody needs to change it. It seems to me that these GUCs will fall
into the latter category. Users can hopefully just ignore them except
if they see a contention on the SLRU bank locks -- and then they can
consider increasing the number of banks for that particular SLRU. That
seems simple enough. As with autovacuum_naptime, there is a danger
that people will configure a ridiculous value of the parameter for no
good reason and get bad results, so it would be nice if someday we had
a magical system that just got all of this right without the user
needing to configure anything. But in the meantime, it's better to
have a somewhat manual system to relieve pressure on these locks than
no system at all.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#29Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#26)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Nov 9, 2023 at 4:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

IMO the whole area of SLRU buffering is in horrible shape and many users
are struggling with overall PG performance because of it. An
improvement doesn't have to be perfect -- it just has to be much better
than the current situation, which should be easy enough. We can
continue to improve later, using more scalable algorithms or ones that
allow us to raise the limits higher.

I agree with this.

The only point on which we do not have full consensus yet is the need to
have one GUC per SLRU, and a lot of effort seems focused on trying to
fix the problem without adding so many GUCs (for example, using shared
buffers instead, or use a single "scaling" GUC). I think that hinders
progress. Let's just add multiple GUCs, and users can leave most of
them alone and only adjust the one with which they have a performance
problems; it's not going to be the same one for everybody.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#30Nathan Bossart
nathandbossart@gmail.com
In reply to: Dilip Kumar (#29)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 10, 2023 at 10:17:49AM +0530, Dilip Kumar wrote:

On Thu, Nov 9, 2023 at 4:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

The only point on which we do not have full consensus yet is the need to
have one GUC per SLRU, and a lot of effort seems focused on trying to
fix the problem without adding so many GUCs (for example, using shared
buffers instead, or use a single "scaling" GUC). I think that hinders
progress. Let's just add multiple GUCs, and users can leave most of
them alone and only adjust the one with which they have a performance
problems; it's not going to be the same one for everybody.

+1

+1

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#31Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#25)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

I just noticed that 0003 does some changes to
TransactionGroupUpdateXidStatus() that haven't been adequately
explained AFAICS. How do you know that these changes are safe?

0001 contains one typo in the docs, "cotents".

I'm not a fan of the fact that some CLOG sizing macros moved to clog.h,
leaving others in clog.c. Maybe add commentary cross-linking both.
Alternatively, perhaps allowing xact_buffers to grow beyond 65536 up to
the slru.h-defined limit of 131072 is not that bad, even if it's more
than could possibly be needed for xact_buffers; nobody is going to use
64k buffers, since useful values are below a couple thousand anyhow.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes... Fixed.
/messages/by-id/482D1632.8010507@sigaev.ru

#32Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#31)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Nov 16, 2023 at 3:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I just noticed that 0003 does some changes to
TransactionGroupUpdateXidStatus() that haven't been adequately
explained AFAICS. How do you know that these changes are safe?

IMHO this is safe as well as logical to do w.r.t. performance. It's
safe because whenever we are updating any page in a group we are
acquiring the respective bank lock in exclusive mode and in extreme
cases if there are pages from different banks then we do switch the
lock as well before updating the pages from different groups. And, we
do not wake any process in a group unless we have done the status
update for all the processes so there could not be any race condition
as well. Also, It should not affect the performance adversely as well
and this will not remove the need for group updates. The main use
case of group update is that it will optimize a situation when most of
the processes are contending for status updates on the same page and
processes that are waiting for status updates on different pages will
go to different groups w.r.t. that page, so in short in a group on
best effort basis we are trying to have the processes which are
waiting to update the same clog page that mean logically all the
processes in the group will be waiting on the same bank lock. In an
extreme situation if there are processes in the group that are trying
to update different pages or even pages from different banks then we
are handling it well by changing the lock. Although someone may raise
a concern that in cases where there are processes that are waiting for
different bank locks then after releasing one lock why not wake up
those processes, I think that is not required because that is the
situation we are trying to avoid where there are processes trying to
update different are in the same group so there is no point in adding
complexity to optimize that case.

0001 contains one typo in the docs, "cotents".

I'm not a fan of the fact that some CLOG sizing macros moved to clog.h,
leaving others in clog.c. Maybe add commentary cross-linking both.
Alternatively, perhaps allowing xact_buffers to grow beyond 65536 up to
the slru.h-defined limit of 131072 is not that bad, even if it's more
than could possibly be needed for xact_buffers; nobody is going to use
64k buffers, since useful values are below a couple thousand anyhow.

I agree, that allowing xact_buffers to grow beyond 65536 up to the
slru.h-defined limit of 131072 is not that bad, so I will change that
in the next version.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#33Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#32)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 17, 2023 at 1:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 16, 2023 at 3:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

PFA, updated patch version, this fixes the comment given by Alvaro and
also improves some of the comments.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v6-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v6-0002-Divide-SLRU-buffers-into-banks.patchDownload
From dd32d90d3a6563bba258ee78fe3e3a5c1a413ede Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 10:24:41 +0530
Subject: [PATCH v6 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 45 ++++++++++++++++++++++----
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 12 ++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 137 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 8237b40aa6..44008222da 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1019,3 +1020,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9ba5ae6534..96810959ab 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1017,3 +1018,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..77511c6342 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3419,3 +3420,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..b0d90a4bd2 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -134,7 +135,6 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -258,7 +258,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +501,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1029,9 +1039,15 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1082,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1613,3 +1629,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..923e706535 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -373,3 +374,12 @@ SubTransPagePrecedes(int page1, int page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..98449cbdde 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -149,6 +149,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2444,3 +2445,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..e4903c67ec 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5011,3 +5012,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c1345dab98..8649b066a8 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..51c5762b9f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -139,6 +144,11 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks
+	 */
+	Size		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -175,5 +185,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v6-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v6-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 37027c2a3560fc3a9c017cdb3a0b6501b85d9522 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v6 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc35a46e5e..693a0e6172 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..8237b40aa6 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..9ba5ae6534 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,11 +493,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..c68d668514 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index b764ef6998..c1345dab98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e48c066a5b..364553a314 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v6-0003-Remove-the-centralized-control-lock-and-LRU-count.patchapplication/octet-stream; name=v6-0003-Remove-the-centralized-control-lock-and-LRU-count.patchDownload
From ab0493dee5c682aa0e8d22075b88fd2ca8fb0bfe Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 14:42:25 +0530
Subject: [PATCH v6 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 122 ++++++++----
 src/backend/access/transam/commit_ts.c   |  43 ++--
 src/backend/access/transam/multixact.c   | 175 ++++++++++++-----
 src/backend/access/transam/slru.c        | 238 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  33 ++--
 src/include/access/slru.h                |  63 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 +--
 12 files changed, 594 insertions(+), 248 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 44008222da..a4fd16ec7f 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -275,15 +275,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock w.r.t. the page we are going to access. */
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -301,17 +306,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -324,10 +329,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -346,7 +351,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetSLRUBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -397,14 +403,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -418,6 +423,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -498,13 +505,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -512,10 +516,44 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/*
+	 * Acquire the SLRU bank lock w.r.t. the first page in the group.  And if
+	 * there are multiple pages in the group which falls under different banks
+	 * then we will release this lock and acquire the new lock before accessing
+	 * the new page.  There is rare a possibility that there may be more than
+	 * one page in a group (for detail refer comment in above while loop) and
+	 * that it could be from a different bank, but we are safe since we will be
+	 * releasing the old lock before getting the new lock, so if the concurrent
+	 * updaters lock in opposite orders, there shouldn't be any deadlocks.
+	 */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetSLRUBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock w.r.t. the current page is not in the same as
+		 * that of the last page then we need to release the lock on the
+		 * previous bank and acquire the lock on the bank w.r.t. the page we
+		 * are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -535,7 +573,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -564,10 +603,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -656,7 +696,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -690,8 +730,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -705,8 +745,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -715,7 +756,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -750,14 +791,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -768,8 +805,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -801,7 +839,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -833,6 +871,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -843,13 +882,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -987,16 +1027,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 96810959ab..ae1badd295 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -219,8 +219,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -230,13 +231,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -337,7 +338,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -527,9 +528,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -685,9 +685,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -714,12 +712,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -768,9 +767,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -802,6 +801,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -822,12 +822,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -981,16 +983,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetSLRUBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1002,7 +1006,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 77511c6342..6aa72acf22 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,14 +1910,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1895,8 +1952,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1905,9 +1964,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1916,7 +1976,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1976,10 +2036,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1994,7 +2056,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2016,13 +2078,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2047,13 +2111,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2068,7 +2132,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2076,18 +2142,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2099,7 +2164,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2114,10 +2181,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2405,6 +2471,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2415,13 +2482,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2454,15 +2522,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2760,7 +2830,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3242,31 +3312,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetSLRUBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3332,7 +3404,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index b0d90a4bd2..dfbe0fd5f4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -72,6 +72,21 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -93,34 +108,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -147,6 +134,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 /*
  * Initialization of shared memory
@@ -156,6 +144,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -165,6 +155,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -181,16 +173,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -202,18 +197,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -234,6 +227,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -245,7 +242,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -254,6 +251,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -307,7 +313,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -346,12 +352,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -406,6 +413,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -448,9 +456,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -459,7 +468,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -503,9 +512,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -529,8 +539,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -552,6 +562,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -580,7 +591,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -595,7 +606,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1039,7 +1050,8 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
@@ -1081,7 +1093,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1103,7 +1115,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1177,6 +1189,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1187,10 +1200,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1204,7 +1230,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1244,6 +1270,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1254,25 +1281,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1302,10 +1342,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1346,15 +1388,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1388,7 +1444,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1630,6 +1686,38 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".  Note multiple
+ * evaluation of arguments!
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this macro.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1646,3 +1734,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 923e706535..ff47985f08 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -78,12 +78,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -101,7 +103,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -131,7 +133,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -194,8 +196,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -213,8 +216,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -223,7 +227,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -253,6 +257,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -260,23 +266,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -310,6 +340,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -321,12 +352,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetSLRUBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 98449cbdde..67da0b48bd 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -268,9 +268,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -571,7 +572,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1403,7 +1404,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1413,9 +1414,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1439,6 +1438,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1484,6 +1488,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetSLRUBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1510,7 +1525,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1989,9 +2004,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2011,7 +2026,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetSLRUBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2052,7 +2067,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2205,7 +2220,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e4903c67ec..7632c42978 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,8 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -847,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetSLRUBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetSLRUBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 51c5762b9f..d9be57de75 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -21,6 +21,7 @@
  * SLRU bank size for slotno hash banks
  */
 #define SLRU_BANK_SIZE		16
+#define	SLRU_MAX_BANKLOCKS	128
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -62,8 +63,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -76,36 +75,52 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
 	/*
-	 * Optional array of WAL flush LSNs associated with entries in the SLRU
-	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
-	 * highest LSN known for a contiguous group of SLRU entries on that slot's
-	 * page.
+	 * Locks to protect the in memory buffer slot access in SLRU bank.  If the
+	 * number of banks are <= SLRU_MAX_BANKLOCKS then there will be one lock
+	 * per bank otherwise each lock will protect multiple banks depends upon
+	 * the number of banks.
 	 */
-	XLogRecPtr *group_lsn;
-	int			lsn_groups_per_page;
+	LWLockPadded *bank_locks;
 
 	/*----------
+	 * Instead of global counter we maintain a bank-wise lru counter because
+	 * a) we are doing the victim buffer selection as bank level so there is
+	 * no point of having a global counter b) manipulating a global counter
+	 * will have frequent cpu cache invalidation and that will affect the
+	 * performance.
+	 *
 	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
 	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
+	 *		bank_cur_lru_count[bankno] - page_lru_count[slotno]
 	 * The counts will eventually wrap around, but this calculation still
 	 * works as long as no page's age exceeds INT_MAX counts.
 	 *----------
 	 */
-	int			cur_lru_count;
+	int		   *bank_cur_lru_count;
+
+	/*
+	 * Optional array of WAL flush LSNs associated with entries in the SLRU
+	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
+	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
+	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * highest LSN known for a contiguous group of SLRU entries on that slot's
+	 * page.
+	 */
+	XLogRecPtr *group_lsn;
+	int			lsn_groups_per_page;
 
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -153,11 +168,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno)
+{
+	int			banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -185,5 +213,8 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
+extern LWLock *SimpleLruGetSLRUBankLock(SlruCtl ctl, int pageno);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9a02f33933 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetSLRUBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

#34Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#33)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

In SlruSharedData, a new comment is added that starts:
"Instead of global counter we maintain a bank-wise lru counter because ..."
You don't need to explain what we don't do. Just explain what we do do.
So remove the words "Instead of a global counter" from there, because
they offer no wisdom. Same with the phrase "so there is no point to ...".
I think "The oldest page is therefore" should say "The oldest page *in
the bank* is therefore", for extra clarity.

I wonder what's the deal with false sharing in the new
bank_cur_lru_count array. Maybe instead of using LWLockPadded for
bank_locks, we should have a new struct, with both the LWLock and the
LRU counter; then pad *that* to the cacheline size. This way, both the
lwlock and the counter come to the CPU running this code together.

Looking at SlruRecentlyUsed, which was a macro and is now a function.
The comment about "multiple evaluation of arguments" no longer applies,
so it needs to be removed; and it shouldn't talk about itself being a
macro.

Using "Size" as type for bank_mask looks odd. For a bitmask, maybe it's
be more appropriate to use bits64 if we do need a 64-bit mask (we don't
have bits64, but it's easy to add a typedef). I bet we don't really
need a 64 bit mask, and a 32-bit or even a 16-bit is sufficient, given
the other limitations on number of buffers.

I think SimpleLruReadPage should have this assert at start:

+ Assert(LWLockHeldByMe(SimpleLruGetSLRUBankLock(ctl, pageno)));

Do we really need one separate lwlock tranche for each SLRU?

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)

#35Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#32)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2023-Nov-17, Dilip Kumar wrote:

On Thu, Nov 16, 2023 at 3:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I just noticed that 0003 does some changes to
TransactionGroupUpdateXidStatus() that haven't been adequately
explained AFAICS. How do you know that these changes are safe?

IMHO this is safe as well as logical to do w.r.t. performance. It's
safe because whenever we are updating any page in a group we are
acquiring the respective bank lock in exclusive mode and in extreme
cases if there are pages from different banks then we do switch the
lock as well before updating the pages from different groups.

Looking at the coverage for this code,
https://coverage.postgresql.org/src/backend/access/transam/clog.c.gcov.html#413
it seems in our test suites we never hit the case where there is
anything in the "nextidx" field for commit groups. To be honest, I
don't understand this group stuff, and so I'm doubly hesitant to go
without any testing here. Maybe it'd be possible to use Michael
Paquier's injection points somehow?

I think in the code comments where you use "w.r.t.", that acronym can be
replaced with "for", which improves readability.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
(George Orwell's The Lord of the Rings)

#36Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#34)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 17, 2023 at 6:16 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thanks for the review, all comments looks fine to me, replying to
those that need some clarification

I wonder what's the deal with false sharing in the new
bank_cur_lru_count array. Maybe instead of using LWLockPadded for
bank_locks, we should have a new struct, with both the LWLock and the
LRU counter; then pad *that* to the cacheline size. This way, both the
lwlock and the counter come to the CPU running this code together.

Actually, the array lengths of both LWLock and the LRU counter are
different so I don't think we can move them to a common structure.
The length of the *buffer_locks array is equal to the number of slots,
the length of the *bank_locks array is Min (number_of_banks, 128), and
the length of the *bank_cur_lru_count array is number_of_banks.

Looking at the coverage for this code,
https://coverage.postgresql.org/src/backend/access/transam/clog.c.gcov.html#413
it seems in our test suites we never hit the case where there is
anything in the "nextidx" field for commit groups. To be honest, I
don't understand this group stuff, and so I'm doubly hesitant to go
without any testing here. Maybe it'd be possible to use Michael
Paquier's injection points somehow?

Sorry, but I am not aware of "Michael Paquier's injection" Is it
something already in the repo? Can you redirect me to some of the
example test cases if we already have them? Then I will try it out.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#37Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#36)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2023-Nov-18, Dilip Kumar wrote:

On Fri, Nov 17, 2023 at 6:16 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I wonder what's the deal with false sharing in the new
bank_cur_lru_count array. Maybe instead of using LWLockPadded for
bank_locks, we should have a new struct, with both the LWLock and the
LRU counter; then pad *that* to the cacheline size. This way, both the
lwlock and the counter come to the CPU running this code together.

Actually, the array lengths of both LWLock and the LRU counter are
different so I don't think we can move them to a common structure.
The length of the *buffer_locks array is equal to the number of slots,
the length of the *bank_locks array is Min (number_of_banks, 128), and
the length of the *bank_cur_lru_count array is number_of_banks.

Oh.

Looking at the coverage for this code,
https://coverage.postgresql.org/src/backend/access/transam/clog.c.gcov.html#413
it seems in our test suites we never hit the case where there is
anything in the "nextidx" field for commit groups. To be honest, I
don't understand this group stuff, and so I'm doubly hesitant to go
without any testing here. Maybe it'd be possible to use Michael
Paquier's injection points somehow?

Sorry, but I am not aware of "Michael Paquier's injection" Is it
something already in the repo? Can you redirect me to some of the
example test cases if we already have them? Then I will try it out.

https://postgr.es/ZVWufO_YKzTJHEHW@paquier.xyz

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

#38Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#33)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 17 Nov 2023, at 16:11, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Nov 17, 2023 at 1:09 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 16, 2023 at 3:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

PFA, updated patch version, this fixes the comment given by Alvaro and
also improves some of the comments.

I’ve skimmed through the patch set. Here are some minor notes.

1. Cycles “for (slotno = bankstart; slotno < bankend; slotno++)” in SlruSelectLRUPage() and SimpleLruReadPage_ReadOnly() now have identical comments. I think a little of copy-paste is OK.
But SimpleLruReadPage_ReadOnly() does pgstat_count_slru_page_hit(), while SlruSelectLRUPage() does not. This is not related to the patch set, just a code nearby.

2. Do we really want these functions doing all the same?
extern bool check_multixact_offsets_buffers(int *newval, void **extra,GucSource source);
extern bool check_multixact_members_buffers(int *newval, void **extra,GucSource source);
extern bool check_subtrans_buffers(int *newval, void **extra,GucSource source);
extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
extern bool check_commit_ts_buffers(int *newval, void **extra,GucSource source);

3. The name SimpleLruGetSLRUBankLock() contains meaning of SLRU twice. I’d suggest truncating prefix of infix.

I do not have hard opinion on any of this items.

Best regards, Andrey Borodin.

#39Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#38)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Sun, Nov 19, 2023 at 12:39 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I’ve skimmed through the patch set. Here are some minor notes.

Thanks for the review

1. Cycles “for (slotno = bankstart; slotno < bankend; slotno++)” in SlruSelectLRUPage() and SimpleLruReadPage_ReadOnly() now have identical comments. I think a little of copy-paste is OK.
But SimpleLruReadPage_ReadOnly() does pgstat_count_slru_page_hit(), while SlruSelectLRUPage() does not. This is not related to the patch set, just a code nearby.

Do you mean to say we need to modify the comments or you are saying
pgstat_count_slru_page_hit() is missing in SlruSelectLRUPage(), if it
is later then I can see the caller of SlruSelectLRUPage() is calling
pgstat_count_slru_page_hit() and the SlruRecentlyUsed().

2. Do we really want these functions doing all the same?
extern bool check_multixact_offsets_buffers(int *newval, void **extra,GucSource source);
extern bool check_multixact_members_buffers(int *newval, void **extra,GucSource source);
extern bool check_subtrans_buffers(int *newval, void **extra,GucSource source);
extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
extern bool check_commit_ts_buffers(int *newval, void **extra,GucSource source);

I tried duplicating these by doing all the work inside the
check_slru_buffers() function. But I think it is hard to make them a
single function because there is no option to pass an SLRU name in the
GUC check hook and IMHO in the check hook we need to print the GUC
name, any suggestions on how we can avoid having so many functions?

3. The name SimpleLruGetSLRUBankLock() contains meaning of SLRU twice. I’d suggest truncating prefix of infix.

I do not have hard opinion on any of this items.

I prefer SimpleLruGetBankLock() so that it is consistent with other
external functions starting with "SimpleLruGet", are you fine with
this name?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#40Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#35)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 17, 2023 at 7:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Nov-17, Dilip Kumar wrote:

I think I need some more clarification for some of the review comments

On Thu, Nov 16, 2023 at 3:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I just noticed that 0003 does some changes to
TransactionGroupUpdateXidStatus() that haven't been adequately
explained AFAICS. How do you know that these changes are safe?

IMHO this is safe as well as logical to do w.r.t. performance. It's
safe because whenever we are updating any page in a group we are
acquiring the respective bank lock in exclusive mode and in extreme
cases if there are pages from different banks then we do switch the
lock as well before updating the pages from different groups.

Looking at the coverage for this code,
https://coverage.postgresql.org/src/backend/access/transam/clog.c.gcov.html#413
it seems in our test suites we never hit the case where there is
anything in the "nextidx" field for commit groups.

1)
I was looking into your coverage report and I have attached a
screenshot from the same, it seems we do hit the block where nextidx
is not INVALID_PGPROCNO, which means there is some other process other
than the group leader. Although I have already started exploring the
injection point but just wanted to be sure what is your main concern
point about the coverage so though of checking that first.

470 : /*
471 : * If the list was not empty, the leader
will update the status of our
472 : * XID. It is impossible to have followers
without a leader because the
473 : * first process that has added itself to
the list will always have
474 : * nextidx as INVALID_PGPROCNO.
475 : */
476 98 : if (nextidx != INVALID_PGPROCNO)
477 : {
478 2 : int extraWaits = 0;
479 :
480 : /* Sleep until the leader updates our
XID status. */
481 2 :
pgstat_report_wait_start(WAIT_EVENT_XACT_GROUP_UPDATE);
482 : for (;;)
483 : {
484 : /* acts as a read barrier */
485 2 : PGSemaphoreLock(proc->sem);
486 2 : if (!proc->clogGroupMember)
487 2 : break;
488 0 : extraWaits++;
489 : }

2) Do we really need one separate lwlock tranche for each SLRU?

IMHO if we use the same lwlock tranche then the wait event will show
the same wait event name, right? And that would be confusing for the
user, whether we are waiting for Subtransaction or Multixact or
anything else. Is my understanding no correct here?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

image.pngimage/png; name=image.pngDownload
�PNG


IHDR`$I�R�]iCCPICC Profile(�u�?K�P�Oj��
�t����� ��B�T��$�m|�)Et�$:8��D�n���E���K���.�����
j�����c�����u)r�a� b���`iU]"�gu� �y>�w=d���[�9�tKO���}5d�
��;uBg�2��r�m��M���9��|����#�������� �������ZS����GMk%��P�#�:4�p !�������l�a�^TQF��&���xtLA&Vh��$�����
Z�(�D��S��p"2��~�*t��RR��X�����{�*n�����yo@d�����d��8eXIfMM*�i�`�$R�:@IDATx���T���DQ0`���Q�����(��bB�fA1�k���9#�`��o��(�	0��"���-�����93����[�<3't�:�����z)%q*�����p�
���!C�~��e�Ge������#��Y�f�K,E��3����#e���2g�i����p�	2h� i��Q��oG�pG�p�����u��5���7��C���,�������c�9&��"��;J�>}��'���z��%={�t��`t��#�8��#�8�@mE`��Z��\s���q�������^�����n���9y�di���|��Gr�WH��}�_�~��[75jT�oG�pG�p���@=WA,_������eK��X��
���"R���?��:��d��i!Q9�9s����2v�X����d��12{�l���/�]�vi���pG�pG��K����]�^�(��}�����~��&�:���*�p�E5��W\Q�w�n~�4i��W���#�8��#�8�@]C�%`u���X�?��SV_}u8p���bt�pG�pG��S8V��������;��c�,��89��#�8��#�8�@�8V8V��pG�pG�p�*!�{���vG�pG�pG�p�++��8��#�8��#�8�@�����������
���_��;�N�4i�������S'���k?�����[d�5����
������{n���C�f��E������K�-�s����~������KQ��\8��C�k
q�g=����i��<�����s�Y6�[o�
y=z�|������������[�"�D\s�Q��9W�u���o�-������{�Q����uJ��K�J�GT
~��WYx��+�����
�o��v�����O<!|�A�\/~��g���e����w�y���OWx_�z��|��'��o�I'�$?��C�h/��R;G-������|��9\k�k�G��H������KQg��{��;X��J�t�A����~����+������0n��S�L�t������:�{��s!���[!�k����q����6;y��B��~G ����_W��>�L^z���"�������/�l�.������V[��������~X���~[aN������~����^����<�J��A�W\q��5*1����J�<��D�>��caRW�	�c�=V#�^��-��FVZ	2��}{�:u�,���i����6t��-�}x8��c��i����u���^��\HT5o5}�F2�"���>*������8�@����x��w�I�L����j���|�}��Q����E��^u�U�����/o����R#0���������=_t�EYb��
U��6��v�m7��O�X��b�-l�4t�P���o"gV�?�pYi���}�=�Ls�<V�����l��F����
+��t�9��&�l"��yl0������������E����e��/�P`��82�$m���SO�yG�
?�/���1�!��4�0�LVI�r��bR@Y���Zy����g��B���.���%�������/,>�i�p�	V�~����_m~h?��
3�A-����7���]~������
��@�H�3gNxU�kU�����/��r��O9�)���|��:f��:!�#�8B~���(j�����o^A
��w�������0�9����O�����'u��G4hP	XRUqgbL{���i*�g�y�0n��L�~��7�wI������^����~��;����0��}��Q�I7�n�������k�a�5~����V�o��Vu����-��u�x���]����93�����v�i���Q?�7~�qB_	�,��_$Q���c�g<*�����#P�����L���Y��t�B�rQ�du0M���j)�������q�S~�aJ��~�NL?8�������I[J'�	&����ge����z �/�`a���>�����X���7����BJ���W_���uT>���t�VJ�����i��Y�Y��J�R��Km���)]Os��G,e��!)��O)d~t�ne������������8�=�d��L�81�Y'�V~eS���T��	@J��K����_H�{V�/���)��������3��N[�toO����O�� ]������>��={�=e�yF*�/���O�������/��Zk�g�i�t�M�������C�6mR7�p���?�'O+��bJsK�X��H'�)�P
�J{WF"��2��h�����������N��;���Y��/�����?Q�������N����������]J�6v��qs���Z��t&�>�q�r�v�m��2����y���8 �/�[�5�t��p�QGY_	n�(ay��@�3����>��r�-���W�9rd��_L1~���4����q������aM)�h��g�P9�s����|�9i����������r����(P���l�_s�5���_���n��}��1����3�yW%���8Z�!.�ee�
D����������w����,�����"�l�����(W���.�����N�#m�7�����5V��J-���m�VP��Z�nm�v��KXV� V�x�$��N��5+�Hzt�$:4�w�g��m?Vr������b�eFU��N2�_�&Mr�e����j�����!��\�o���1`�<[�d_'���U+a5ic��-E'v�:����Z4�!�������EY�VH����;+���Q��0	*���f�md�%������]T���Q��n�s�x�#	�LYz��mR
Veq�l���Ik��������������z�N�L2K��7$�����w$C����)���-����N;U9{�p'�{���#mT����0�
+��L��W�-&�'�|r����$������1�)s���iP�f�Zc�5���LL�=��9��3���>�������q$�d	)(�5C�j�H���;���ePM���N���$*g{N�����=We����%��������2x�`��&�G��>��8.��rE��I�wR���H���k������� ��������/�e�N������Y��#��YW�D%��.Y"����E��x��b0`�|�&�1	�Y���ER�0��c�{���Y@u-��R�dz��eH&�*���-_]�7�#'�2�dr�L��D�cB�j���\���1��
�%�J��a�u8&���k��f�Lx��{0��Y�A�fq/�����:�����jzp�w�&��d�DA�+���/l)��y��w&_�����a���T�L���7�8�]hoA=�vh��7�RL�a���P7��-��02Q����U�,���b����4+.q!��
�G�=qe���RK� ?����:��k�,���������q'$�B�����W���j��[��W����@\V�J�dr����o_a�tA'>����3����d�>�0�#F�({�Hk�{��A�=?����wL���~�Bb��(�����j��'��f-1�
������ ���J'����V���&n����M$�q��G\H��?�R��}�����0�H� &��������0�H+���I���|��-LJ &�L�R�|y�z��
[U��>�T%�T�X�R�L?��C^�m�!���������l�=	w0DB��>��1$�!0
�c�(�>����%�X���$�X�C���),8�������!�r�J���.na��w��=����X_dY8@:��I	�0s�C�O{%�8�j�+���I����x�|��l�Ic\��%��P�>s�FB�blfdX�����h� }�!,��q���X\�f0�����&��\|$Qq�W������8��LH�@3��Y�#��<���#�a�/�����I
_�;�k1AV]r�:�U�������c��		V����UVY��:P]du0.=����0m��0�D"E��1�D"�{��
��w�i�e�������>L�3�
��aE�<3�BJ��.n\�x(A�r�V�C�h��yB�(�@�G`�Q�P
|��gL������ad1x���&L#u�ob�W��`Ja���>�|y��`�f�e�]�A��'�H8;���</*��a���u�Y����6HY��\�=�a�F��M�z��6���/��$l�G�I��F��*�!��+�?P���p�SP���{���{�n�(x�+\\��?�`�g�$>>�=���\�I����
v0R�@j���+u�3��'d�F��;�('�� ����[��2���=2���[R{e�5~���&	��18s�����4�]YT;v��>`[@c��]1�����]��M� �p�_|�=��#P�X w���Lc�@���]KJ�o1����_pB\�U-e���n�nN�l�U���������l2t�1��t�Qx4�lZ���*��{�7j.�6^�$�`����'�Q���+'t�on�?�og����V�Ny���L� 6�>l���R�(}6��N9 �I��-rG���[!I�c0%�����1M����}!_�t�����A'�)�>�#,��p�2�^'�fdC'�
Fe��~���D<�*p�����=z I���������Pe��2�V��)W�K��{>������X-���`�~p����%PR��a6Ls�%��i(cj�Jh��02�,�c�#N��T��#���4��D�0��g�'�x?P�q���#�Q���{��?�pU2���PUI�����]��we�����H1T��\��+oI�9�?���{^�:w��3�B�����bpJ���^ik|�3��R�q�>i�V)r���!C�>n�C-�����_��zd^A'G�(�V���jT(�^i������Jk0&R��*	�f+3����~$�H"g$0��Be�O2B%����=���(�po�R0(�����	S����E��
:�W$|� ����&��JI�pG�K�:!��$m	�H�Qu�=[z5�}���Y_����2
�a�
�$\�H��W���Qo�
M��zW��*��m��c\R{N*���u�������8>M�D�cX</����o4����L^�����u8V����8����1�-�`�o��FS)��$�f�A*�0�������BVD�G�p��@�%����SuG`�C�=@�T�|���=��u��s6b���\R�la��#����t����G�p���U��#�8��#�8��#�8i��48��pG�pG�p��C��)J����������M�a�6����������q�����;�1�_ia�3���v�t��f����X�������6���9P�5>Li���8�1���E�:t���
��q�*P�jd^���q���6��b���)8��:���2r06e����y�k���'a�J+�\8���QR�)uyG�m����n��
�m�a����^�����\��y�5����:�S9��_F�<�@j��&tX�c�����_�P����
���6��{�iH7�Kt�is�S�8��sn�B���Bu�����:������3N5�m�Q���j��a��:d�}��Q�����N:�Z��p0���I���9Nj"�d�UGD�/��q�g�z^����z���r�%�s&�]]�l�=��a����p�zW'�b��L���Le���s��z>/%sc�����*��=�U����E����_�S����v1t����o=��q���#���/7��.������/��\ z���(Q2=z����	��e����������p+�0yq��S��SU�CRU�'[X��O�:�,f�����w�����i1a6lh���"�d�(�E
L}�E�p�j�!$`zNUV�;����w��-�{]x���sxx8�~~cQ�1�2yNj3��������k�����3`e�qT���t����T)cVJ�4�}@0_��
��s��"�,"7�t����6�<x�������^{�w��V]uU���s�V_}���CPC�C�e9L|��XU�(��3�H�4�7on��\A��P��^{�<��v���?�'^Q�CI�L:��sd�M61�J��jw�����aP
�&c�w�-�0�l�m��=��SP����[,z&����+�#� u��w������W
Nv��w��!I��;uLx���v�;�F^���|�a��)�~.��2+i3�GZVM�2��9H��o~Hr�X@�7��bI��	�
=�4
�z�f�mfq�~�d����YM�����B���k�S��;V!���zp�����Jj�_e��s����[��s��H�C�P�LBJy�����<�K�[��8Qo�������(_{~2���������z+���;f����Yg�e�<�Z�3�;C���g���:����93�=��#�������")lU�8���n;�}��H;*%��<��M��9�m|q�����~H����fX8cA
�H��B�z��������n��G��T��b���1.�����t��/��@�a�(i|N�=���Z��\��;[~��#0�����L�":�]'Fe�Ei��	aJ�j�t��3fLJ�/J}����������Wpb��`����)=��B8�\������)��R���;��c���+�����t���2�)�H���g�T��=��g����/��N�R:�J��cJ��i^y�+��	R����=2)����������i�R�����v����L:Y77���t��R������������t�n��l��A����/X}�����[����j`J���t��R����n���$!������v3��=`)UgK]}���Nq���S%�:u�%��{��^���JR�������]�g��Y�7��fi�����2fY��+~U�3�������C��t����O��R����J8����f���{�=��������S)S�_u�������m ��2�V�SO=��I(;���6�Vu9����y0`@���|��l�5����1\�3�O=O�81-�NV��.lV�9�����1>�L�
D���J'�)��������v���kJ'��+�WP��N��|)#e�<3�A�Hb�&
�c	:��s������D�m�����{J�GH��=Wu�S�g�[��)������b(W�Ij���z��m���]������J���a*�6w�WR��/���t�/u�wT4���B��i���������X�|��6L^�B��Wam���xp�u�YYqO��6�n�	y	�4>'���v�j{����-?���_�LG�r��������f!f����/���2��!��]|����Ux�*1�BYz��+����Z�_.4l��0<��2��5i����������y�����Y3�ds0/������*�d���:94���Qg�H����u$�A��T��<�L�x������� ���\y��	�;�$��|�!���(z�����[�N����
�<��T��G��R>��F'Q�F�����B��C��L.��>)�XU�&���$�"�">�	�����b�������E��0�
���4�0+�-Z��Y�f��*�+�:2��NL�7��qc+/�v����
m���?V��;�����y'}�1{w(����2��]�.]D�tk�H�xO����*�P1�mm��v���j�q?���Y�J"�g��~������;y�L�=�Y���.e!���@�x���i ���s�*�:I�s^�:��n�����cx��)�3�B
i���
y@�����1����nH�TL�AR�F[CJ[(���W��k��c��iA��2S_H�1,D�A�Ic-�C[\����*����Bp�W����������n�@u#���]�)y�`�����Q��EW���'���1�P�P���P��y��Z��U
&T4����R`�+��
+OTfJALH ��1A���\P����=L
��X{�5J�72��=��w���P;��3
��g�o�u�
��a�*���s�P6�H~U�y#���D/+q�����I{���`���5D������;�.���!���z`o�S�1q�P�F�18�l%��B�Im��8r�A���
D���Ro�90���$����K�0�L�Y���U���Z*D]{��������e���$�
Q0?0��S,���}�6C��`�)k ��B*����V�+V�LpL#��U�����Ak0+��	�kB0�`O`Z�Th����k�.��{�p�u�Y��;���c�A0Y����v����{��9�]]������{�����sJ��3`�B��0I
���|��k%��1A�L���*;{]T=�����0����D�s�yaOz���3��69J���g�1/��9����U�RK���3-Iy,���
&�au�l$*�a-��H92�9�+�H�Xq��� ����_����PR�Y���[m+��$^aoU�7���%�BB�D/N�/���|1�����Ke)�]V��=q����'����W��J��(�X�*0��,0X<�(��!BDYFA���������^RJj!1��P�w�+e�3���`������w�9��?&���qiU`�V^ye�|R�Xe������?��tb���y�����|���~X,�G(HP	�P��=^��L����E����8���_hLK��LR{fo1L9�&~�h� }�`~�2Qv�s����6�De)0W�����B���
>�$���e��W�4��$�a���
{��s��9�]�q�2��T�w(�_����� �O�3�b�cS-����g$7_Y�cU�M���L����>��<`�U&�|��J��Gma�Xbm-L
X}�9>a`��M�L������$0!q&�H��+�`5����cn�� ���j���
��-�D~L�	�}�da�D���
C$Lpq/�(k��0�>����>0q�P��P�c��D�+� ����"���=&�,P
Xp
��e��0q��?6��[�$T��`��:jy0�<A����~���������d����y���C�P��P�7������c�`���vF��f~�>�DW���N����%�+�u��G)(�M�4��`�is�u'�A��6�;���eU��R�Hb����CA���=�A0������	�b�b�	�s\%��|q �w��b�EE��������w��7HG���H�K7&�,��O����T!��	&�*`�I����0>��[Pe��?�9[�Ij�0�����4���iAe�q�j������Y4G����{��+����,�Gi��P7,�!������0�|��$���7��)s�7>'�n(������w�s/�@��A��L�Z�)���)��,��1�P(���6��zZ
�W�h�=���N�,F8�{6�c��0qb�r�S,�:����8tH<��2�f|$��	���0���[���W5(3���x����{�r`PD%�Z���~��?b���t2����W�o�'�=��1���(�N�����z�m��C�=��3���hC:�2����'��Ea���7Re$�
t�d��d#��jf�{���(S���Q+�_�������Q�rc����������*DeW]����!<����4�9���m�����M��+Coq��N<�Nf�BW�#1(�>�e�Iy�1�Y�7����?��E��{�&R�5+��'��G��)j �4i{�	���@��{�����?e��	��
�� ���?s�?���@�7�s7�R���l���q���IK�����e�����k���d�B�/�7"���h�7��}�N%��@���s����C�h��pE��zQ	~0�H�4l��J��9	w]I�f�����N����H�V�=�j�N���;�F�jgvr�$+��bDs����. I`� }����b?�;��IK�cR�YG���U+a�Fe��HQE+�����
lQ�)%�"M��+nUMI0jB��/�@$B�������M�"��HN�
]1q"MC�C{CbWM�'�='�/�;}�>�J`6ucpYa���U5��s��R����|�m	�#�<������,�OU�Lf\�����s��~p�OQ7��I4k��>�=	w��%3y(d|��{f9J�\��w����9 ���G�HD ��%zv��#�8��#�89�=`9�qG����^"'G�pG�p��!������G�pG�pG�(���{tG�pG�pG�j�WkL�T-
�T�G��6&�1��q�8a2����u��P�
����l�Zz�L����+������D8C-�9��c*c�������	f��q��/G_�`l�
s������\��l�"h�]��������9��1-��T��p�G��Z4���1����lg��2O��+[���VL�)6O�f����|�����k~�_������������uO� 8�y��w�sQ��f��.���3�W��C�9 ��x8����2I��g=�*�_�g�����b��y1:��s>����-���
f�"�zHx����K8�<�k�s������"������9U����g�qv~8������|�yfQ�:kp����I�lq����S��Fe�b1	e�[R�
m3��#��,�����+�\�����6�T=_�@)p��hz\i`��"+C��e���L��\�
��ya��]���/0����x8���}u�k���@��V]u�!�Hr8�5!1�1�����b���N�5��n���Ll`������������T�B�L�2%���p�������#�8�8�Q�y���:u�d��T_J�7f�X�`��+��}��p�	�t����������@�(j�m��l����� �^~�e����&�lb��p�U�s�9'z��������)S,\���~��~z�����_��c�=��`A�l�����1����j����C'M���u��i��_?;C�stv�e�<"��_=0���m�=��o�����[���m��jJ ���tHX~�[(
:����4�=��H�����[�����@LlB�9���#�=L���`���^+<�@7�y1�����oz8rI�f�m�����1#r�&W�I������p�e��S�<���r���5�@���{��Q{�d0h[z ���i3L������N[����#������k��9R)$������&�=)��i�m�q��=��#)�������+�B�=PZ��b��������i;�(d������0g�/�
�q/��o��J_���L�!���+>���[f�R?'��\���Y��"���9����/�?-q�a��BU�C���1f�.�r!���=!�_G ��~�����z�z��e�Ei���lJ�j���I�3&���;�>����2R)��%&����N>R&L0lt��g��DauR��IWJU�R���~�>��>��M�����t��������_N�������� XZ:A6?���P��{%>�Z)�%�������p���jg�_��/�`��ll���i���-��2������I�*LJs���{������sJ�(m����)�x�����=3V7������{������x��`E�O8���Y�MX���:���I+���n�0`@��W_m���#�����������s��
�o�}�I���3��vP)`i������N�G�ch��R��T�5�����}�(�|m&���F���N�R���tjyQF�B�u���:��@:�2�:q��J�h*0�K.���������Z<*%07�t���TIOJ��,-�+�����:�	��.`�Td�#�����N8���]��6Fy��{�BIyO����(d	�x��7[{��H'�Vg�g��w��������-c2c(m��:N������0��C<�����j��\��������\m&��u��WC]�����z��)��L�����7�106�q�Q�Xa��A�wU��2�����c���p�
�c*��y�G��`%��L0qf��/�e�JI�e��D�I4>,j8 ���Pb����	y.R�D^��n&����HeN�_��X����k`��X���G
�*P��d����1�`B!&�`p�M7�3��a2i�����D��;��I���1�.`Hc��j��}m"_�=��/��B0�C(�u�]gy�{�?��tp����!�R�|0q�`\��	h ���!m4�C�,���xr]Uh������>��3�V`�U*b��;��3Sq��	;y���f�zg�����>k��!�RiX��[�k��F�\u�d��0�
�������7i2M�3v!\!�\yO�����o��/@��9zf1�GB_�j��*���<$1)�'�L,i� \e�"���XT�G������I!�@�������fB<����j����:�8��g��UJ�R����r�q�1�H�"~����Q����U=��T��]-/,���U��'*��8�@��� �(_.���'����_�l�<��;J�>}D']f�L?��|���w�
e#����5i�����z��
���b�u����E������:X-���n���m�&BT�x�R�(�p��
�Z+������d���2��hg���F�
��:b
���kg�b�����e�����?j�-[�4�*X#���Kr
I������Zi���2Dtrd8����*m�P0�2�2������5�bl��,������uk��?�@�'�N����n�fP�#��9�
1 �;~���[�=�/Zr�%EF�QFT Q�P;������\�$��\��y����;m��o��g�9��q������[��W�_�8Q�����$���J��O�\���l�%��W�%�
%����jU�J����{wsK�P����>��^����R���QMg,�{-QmV��[!���~�8Gw~3�`�����~4���)a�l�f��3�yWU?���$�;V��e3�����$�}]������O4S�!_��~�n�\U�O6�|s����l&����=2�~%��2>t��
+�`�U����[.��
8h������}�&�o��VQU�����\s��B��o,*�1�L%T�*��)f���z��t�gU�}NL�h;�%3�1q/5���|��}J�	TH�	~3�*���uF���=[�f^����`�f�qbbG
4}��p+���I����J�$��Xj�W�,�c2��0Q���u��AR�C>]t�4f=�/�5W���b!i��#�8B{�F�4����m�f6?���DzXd(���4N���}�W�^6�����N,�d�/�J*[e�LR_
�[d�E�V%z��	s��X��'�|�1L�m�32��`����*���#�b|qe^��@�!���|�qdo6�VP!������pD\V�V��V6Z����e9����*�u�Lp`X9c�U
+V+���`�")~���
aR�YJL�2����;��	L�{RH:`�`���o���IA�f���v�������TR!��#�b��++�L�az)�>�0[�`px��i�� ����:��F�*��0N��*>��hXd9�����T�^0���f�����R�.U�����BY��]UoJK��6� ��H�S
�L�1��j�f���.������E
&���DT�08��,�!Y�`���-�;C��X`F��L�b(��|}-W�cP�|S�,���*�t�����|�/��oF}���RQ��'��B���8�xA���
��h�`P���&��F���������u�{�8����E���-�1����-o�$����\e��*�pTl�I��!n�1�-��6�
�*�6j������ia���1�� ��w�A�$�Z��9-��C��W��"u���@�|{�_�:�@������Z���a�rQ}���#�!]!5\�m�WU���Wt�!U�0w�d{E�aT�,����=w���`�)�}����(
�Z������>��o�1-=T�w"�a�7��!�����{Ka�#���>�?��	���bx����x����E���i���wBZ�����z
�:)�v��|3����BH������$-@���o��O�m&[^0����Q���a��J�#���i�8�*���DW�#�m���������+��j��c�C5���A��R�.C\C_#�\u�^�P�!_������C ���2�������*C��N\I}���r�#��C����
��@�As�K�/���E��xC�HR�8A����G%=�6B�A��@Iy��f+[pOj3�_�ZH_
�1���DY�W	al�g�N�Z�����@��q�e��IU��:���}������oO���;��<�q�A���3���D�UXL��X
e�_'uf6:�J�%������+�1[x���^�*+��x�J���8����U�V�,��=�P�^��x3����0[�>�r80Cj�IUm3��!i��e#eF��
���=qHm�*Iq?H2u���g�U&��������������F[�T�
i%�=�+����X���?��9�6�%��+�H'��V%OIc��#H?Q���������y���������s�2���������87�P�D<2G�p�Z�����(�J�%�BJ�+������:\�^��"����w���R����csG�������TBr��&����(-.+-��#�8��#�8��#�89p	XNh��pG�pG�p��"�Xi���j	l~��w0���X��/�<t����=0��j�'��o�g�xW��x�|yqf��t��/�`�S�q
���}&7��s�8�#�jc���pG�pj>����:���HR��y'��c��'�&�d�3�8�eA ����_��&���7�4+uX�gQu�L",*f���i���f,����#�8��f�W�A����O��7���	��m[;Wk]�r(3qpH6g���k?�g�qf���n�������s�����n���>��e����0S������ ��[L8�5�mLM���B�_��n��!OX�s�=CTv���g�m���{�a9�|�}��4���8��#�85g�jV}���`�Dk1i��bL��"�N�:�V��@^$,z���}��T�u�]A��tL�cJ�C��t�}�Y<�\=� ��s����t�+o�p��O�C�1����0���lr8���f��	��1@�Pk�s��5�X#�f���~��fL�`N�x�������\����?2r�Hs���;� �z����~���O?�N�17
�z.����������v�m2e��������a0�;}
3����[�_pG�p���N.����2vh*�flX�,�$�P�����!��><g�r�,�����v���3�8#��c�w�:�tRk�����?����Ot�09s��N�-<X�Cw9��������A��?��isx3������>�	�y�pN���;P��c����?�$����D�s��vHp<�J>�=��c��qq���)�2�_�%>	���p1�RF�Cc�e�TU��<��#���Z��w��be�"7M%�$���3�RG��Cn�p�3����v�j��;�E�?,�6JYN=���2�v���OG�9����8,��@j���A��|`���s�/��z��"��N;��{��p� ����#�8��#��L\���r�2����w��)��_e��4i������*�M��@���T��%3�@�&��]p�&I8��Mr�_$Hc �l��_,�����o�P��@��;��7:���}1!i	*��?f�������)y��
&�'O��s#G�9���M-�R
���N�:��C�!�H5t�������IY�Nm��&�.$=�>��w�q2b�;H�f�����G�@
64���p���A���7�{���W��n�Z�4)O��7��)�c���D�!��9e��LH��u�VpT�>���]o������i��o����T%9H7�5e��q��iv���Kp�p=��SDmy�����3��2��{T�Y��@IDAT!���#�8��#��L�+c�0Ie��/���2f�������S����:���,J�/��<x���q6
��S=�b�-lL,�W��@L�Q����h��u5�.�wb��:��'�W^ye{�L��O�I��a"�-Q��1X*�K�v�%�X��y��g�h������zh���u�Q��*��a�8���x�t*�����}�F�D�:������]�v���27`�Yd�E�f��[o5��A�v������:�8'G%�EE�^*����/oLLv`n��;�E~/��b?�������]U:h��7�����[�nN�?�y�7n����}_��
������P�&M��, @?��S��oG�pG�f!�X���R!~����;����$��<�$;L�%�`j���`�
B�!9P�/�����<�)��8-������1!����[�&����#�c��"�.�����1������
�@��L��tc(�}G������	Id{���k&`D���yA23�>2��w&
�:K�0����;�h��6��
�g���7��p�w�����_T���4a���������{�2�Y��/��5	'�����s�Y8���?��|]��8��#�8�@Yp�,��K�I*Y~�E�= &�W\q����D���Q�
F;v���B��u���j\ $X�#m�eJJ�4,L�C��+�+��D
IU��Hc�I`������8L�#�����t�����fp�r11/%���{�,� �	�#�������w��dLL
R����F��D�1��_���.f$Ic $��W,BX*D��I04��7��,�5@��
7�`V��B�Z��k�D<m&HX�z����V_}u�}j�����I7���q�.u�,?T|�G�pG��8VF��H���V�����$�u:$S�^z�I`h{^�Ml�M��L��:�,cT�
���/��I2�"$H��h���#�`�����_�fj�	**~��p�L���a��5� ��"/��I7j�jHD���O�bRC��HS�!,���y�Q@Z�����&�H`��=�*3�2`�F*Nj��$A����������!Lg����4�����7�"�	�.�6X
B�+��������6�
���2�W���]�HA��`���}��"�z��W�2cf�0�q���4�0�m�nX����RZL�����LT��pG�(5�6H���N>SZ����t�+�N4S]tQQ��2$�6�tbS:QM��"�����R�,�����*�J�RXJ�[�#"e�=a�+�2ni���QS��E�jDV
ubm�U5��T����f)�dV ������]�?�a�/������vP�.�U%1�����Ls�2_�����<��
���;��������^{m�Z5{��A���`�O�#E�!�p=���,L����!u�K
���R6+��ai?�g��(�95oX�T�=3�=+�*���i��W���M�81-n�	m#�2}�8 J�<`��<q�*��k�O���N��/��t��K��=�.r���LZa��pG�X�G����,p =�0�0�0�I0��������o�=�����C�x�����F]}F���@�� �*%���Z��I�b5?����
��AE��y��G�pG��8VZ<=6G�����5�\��"���G�pG��)8VSj���8E#�5�LC(�H�R�_��pG�p����3`5�&<��#�8��#�8��#P�p+��������#�8��#�8��#PSp���D�v`0R��8��#�8��#��v���5���s�8���pG�pG������vmG��W0�����6�(g��@�g�}6�wpG�pG�pj.+cM�!���C����p�Tm�P��^zI�@��X�}�Bo0���[J�.]����pG�pG���8V�j���?d��)������=P�&M���^[��/����>�H��c:th���M��n�MF����G�pG�p������/c�����r�
7X�"���/cnJ��3�<##G���S���9s�Y�fr�	'��A��Q�FQB�O?�T&N���G�pG�pG�6#�{��X��7� *cvJ�t���O�>��OX��z���={�1_�����|�����O�,]��pG�pG����
bkh���2n�8��3��'O�6m�j�W\q����W���'��u�Q�FEh�u�Y��G{��G�pG�pG��#�*�e��/��RZ�li9�k��asS��Q9�9s����2v�X����d��12{�l���/�]�v���'�|�������8��#�8��#�85�����������^.�7�.��1Wd}�W����[)�4ib�XF����3_���9��#�8��#��%\V�j�������UW]�o�/��pG�pG��K8V�j�����8��}��5 7�G�pG�pG`�"�����SsG�pG�p�:���OO�����8��#�
���k�������G�p�r"�F8�����������s�$�s��#P9>����2����P��#�8��#P4��%@��_J����g
��"=x��B�s���~)���^���b������e}���e���������/)��\#$���q7���,yV�����k�����l�I�R��iQ�\��x��|����yOO+>������si2���7i^B��y
�����~
A+\��{�x������*�%� ��������������_:��#�8�@�p,�o�������g�5���-����Kc"�x�.cv�������G����Ij�u:������������=W������^~Z�*�i���3d����bM�e���^N�U���C�F���3�����y&�D@���|���u�����qaR]*b�O�����W���j��aa9�������m��#����������=Vv�b��>�O�u�/��8�8jW9�������|y���;��'�
e@������E�YQ���k.91*���5�S���<���i��lv��R��]T�!6��o�eI���#��w^��`0
qO�x_Z��M~J��l)�{�����72��_}��>���1{�������f��{Z��h����n����+�5�1�����������b0���{G�pG�(g���u�E�I����������?)�����*�d�1�K����~��.'w�|����#yb�}N��L���w�Z���G���'��8I��h�4j�8��l/?xw������C�9W�;&���S��j���B�/�x./��>�����.���r�����_�(9�<�y�[�a���������O���4�%�/#���E^|���������c�]%�q������YSy��qi~�=��[<�?z�U�������;cw�u����Fn���\q�s���'����yl��������Qz�����e�U�H���v{Y�S��C����������F~��U��(K�XV`6��4����:[N<bG���r�}o��ag����w$�G���4\���6~��5�^>^���?��8�W�-��������<	���\d���_y��x��r��G�������7�\��������8��#�8���3`9p�����w=P�/�R:t�*���U}�f��/?���N\-������M�f|�]�-&����Q������4p���&���t�;o���k�����D���L6��&?)K/��,�R�� ����i�����r�Yi�W^�(����IPa|��{,T��|^}��Q������d�{�\�O$����m�/V�#��ie������&���N90���^w�w���d��/?��B�B���$�����B�v��RK//+�^��t�������i���v��}8b
�?�}��t��]���u�������I�J�����?��)��nQ�{���&!��/,�������Fr��s-���=c�.?U��a�����h�&�r��2���#�OM�.a�7�t+�����_~�b���W_��x����o�ct��E�q�����������>��-���G�6�q�1��5��V���<��c��k���1�[���ko ~���L��
6�%/L����G�pG�p,��5�����(�l��������1)���L���mU��������t���t]1�����D���>CN��O�A�y�e�)��6��#���Y������[�=��!�����f�u�������f�s�2�|�W�q��'����1w����t�ps������N�������B��A�4�9���9|7�G�s�V�Z^���X��<���Lx��*y8������GZ��{����cr|�U��e�P�������/?�K�:�TYG��1���y�*�y�����e��$�����'��w�z�Bt0Z��wTzYoh�N���LW�-w���mku�[�Bd.�VP�k��zV&�9���q"M{��I�z�UVOsFb��(p��U���Y!��J�k0������l�h��������8'��N���Jm��v�[�f��|{�����pG�p*����Z�C6[b)�x��u�~�I�f~8M&?���I�����I��1��m�S�~�EU��������_���\��-�_��M���<eJ�r���gr�����Wi�HQ:u���������&Ax��kLj�v��2��i�b��ME��n��c~N��^9i��:1]M6�|�i�C-���+�([�Zh!S?��sE�q�T���c�%���8C�x�N��?��4���uR�m�S�+�]��2����,�hQ�����woU�*<�]�]~����>�n���j�������Za�v&����c�:b�m��
����q��WJ�5��:A"���:�i�%����-�?��H'aL`����t�#_|��o���vEb$k�t�I�p@�f�]]����o�)�b��,��Z9b�Q���Qs��������d�����6�o��\���m/��m��@����e�]W���f��9s�I���Bu�e�<	~x��?�7_}j�K��_G�pG��7�+}�5:�#O�T�����u� ����|������	�=�7������\+cn���2�'j�������������f�����K-��JF;�xyR�C��l��"��������t3Q.�����>`��lR������<�DY�{��^v%cr�d��^��2x���P�+�}����+��c�]��'���%�2��@a��'�ugN��R]�T�t��
��~��6�k{%����`��~�'���Z�n���U���H�?������f��OJK��������� j����z�-�����g$Y-�Iw)]@���/���4h(�=����4z�E�1^q)~�p�2'�=BH��G�pG�zp,�Hyv��������6	c�b�u���]���rW���3Q���� Z���	�W�&8-���f�~���I��g��"�0���\��7�}\�G����z�H����U��hR�p����|���H��������G�8KM��I�0��l?����������Q!�.��3�����2�
�l���"������h�}�����0EKp��U����D	C�>�M{�� �"RK�f�B��"��$���g<�K&�?��A�S��C^~�qyD
�@�.���q��N	s[]h��KR���g�I]�P���` Y�����'?�1����~'#�g���~�8��#�8�C`��EU�bb�jEX3�5�w��p+(Q��<nT���t�Pb�x�����KTh�r�C�l�,,�q_�2��AU���=*��[l��&��FNX���\0:�������#e������S�ks�E��%���=K�m('�0�/{��Y� t��V2���B{�}��`]o������*$N:r���u���Tz��;q>���Y0ID�2Q���4��Ir�3����#�������$���s�1���� �9���8�V��C �i�n)4o�Ac,-����KA�i}���t��O��%V&n�8��~��9,v��������1�Qed_�V��m�#��<w�v����>��$��3Xn�����<��<0O���_��7�'�i=��8���L����#�8��#��g���9��GM��}�e��Q'_n{C�������T9#�s��Xm����D=�)&��%�I�������G�0�<_t�ai�	����>`��G�]d������z�������l)�O���l�m�\%&�L�1���|��N�zd�����m�������2��E�^v�QQ��l��3����/o�0kH>h#��e��a�F�����P�X�C��FL�U�����G�r3C�$&�?��]!��3��N��z$`JyFZ�
W�{m���y�f�a$1����(�3����#
�R��@S^}��$����j����r��$j�����;R�B(�+�_$[�#��������4+0����z����m�`�}7��
f|�CwX[;y�.����f�Xp�/��1��2�?���w��g������.�����D��XN�Sv���,,�G�c�	�n��G�W~uG�p�jD������1�>j��0�DU(s�
���������"��"��<��
�L��ax���W�@�-s�]�_�+R7�������{\��RN�t4]�y�UM1��d�e��0?�R,f�&����,B3�pg���,�|i3�����K������#�8�@�p�THz<��jiXq+�!�j��GZ#}�(���tU�xFQ�DBT�	��$j�qr,���;��#�8�C���a�19�@
C��_�$4��nm3^���.�w�.����pG�:���]�:��G�pG�pG�������I~��Wy��Wd��9��w�:���pG�pG�(��������/��"k���|��'��kW�4iR?��pG�pG�p�E�%`�b�wE"����������;��;��#�8��#�8u?���y�J���?Z\n�a����G�pG�p����3`e�����N:t� �:u�?����9�\�s�������yA�C�C9��#�8��#��5\��5����)S,�� ����.�������/H���:��#�8��#�8eE��2���iS����,������/�P��i#c��)#���#�8��#�8��#�`!�V�������/��<��&�l"�G����L8��#�8��#�85g��XC��z���-}����Z���]s�5r�����]�F�I;��#�8��#�,P�
b���~�~��Y`�6lX���t��--��4����zG�pG�pG��,X"����F�P�^�(?���e
�Yb�%,�o��V
��g�pG�pG��8V�zXf�eL}�
�1'�K������YB�v�m+��rG�pG�p���Uvu���f��)��������=NG�pG�p�Z��3`��:�0��#�8��#�8��#P�X��)���y�G�p�����WC��#�8��#�$!�{��rwG�pG�pG�pJ��3`U�����Y�#�����1y�bx���e��c�
��k��]�����Q��������\����_����^���vG�p�"��U��?�����kG�q��e`�u��FM��[/�dl�2���cr��g��_~�sG _|6�����T�o������H��<d�.r�����G
����x�Bu�������������7^\��XN�{S���/r�����#�8��#Pp���������}��[xG�y|��5�x�������'U2FV,��8HN9��b�����&�� ?��]�a=@����_��e��/>���Y�H�R��*���b��)�~���TG�+$R��]���"���'��[���?�H.8��J��AG�pG`~!�X%�n������e����RK//�
f11�+�~��;���C���?
V��M}�y2��{�vr��g�����Ox�u���k��v�O����v�>������7�O8�_��7�x���&a��aa�w������D�o��l9���e��9���/?�:
~
���S�z���CM��t��W���C2E}�����rS/l��r�x���6�
��g!�����2�GAj�r�����?��m�eI���v��O���������;Y��ff���\r��2��j?m��Y���Q]�-=z�"6�fI�������#�8S]�{���#�8�@�X�r�<T&L��u��4�)��_�!L�7�C���%��U�����o��\�����/����^��Wj+�.�Ky�f9��A���;H�z�Lb�������v�������H���C��x����o�*��J���8~����L�{:����&O<|�\r�r�	����w���]Q�������2	��gN���]/���KN>�V�������7D��1��wK���k/=%g��_Z.�J6��G�p]�����2����?���lY�����
�����.�o��9�l��P}���k��������L�l����g����s���8%����.���/{@�-�\X X������������,�D��������d;e�`|�<a����RA"���K[g�B�/G�p��!�X	��d�{r�)��CFJ�6k���r������/p�Vp�������H�6k�d|�$;[Q�;�Yy�5�I�f2��+�P
a`�7Y��,�r%{����#����������e�G���2����������F�.&�]x�������def��yW>$-4O�^(�0D0�0�q����t���^���4��|�����f��9����$u=���� !#���fR���}C�_a���YD�r����������F��T�t��*��KV��i�M"$����W���m����j�������8A�7��#�8�@�@��*V�g�|h*m=��]�>���bk�������kZ��a��%��p s�<��-��Y�G�[��\*�t��H�����?GN<�FYl�y�n����[�Z���C�����gLW	�f�b��,�����<��*��3O�S����'��C��7_{�$c0_�}������>H�������?���4`c�n������N���/�=�d.K��Lh���2Nz�>S�d?�w�|i��6ky�q&��oG�p���3`U���#��)]�o'G���h���(��}����#�}�#H

��*��B��Ie�TP��Xf���:��/�`��/�E����H�
{�R���/��M+2-�6Ny�3�i�TKS1D�00.���>�U���N�.����2AU7����}�5g�:�
r��/s�����P���y�Y7���^+/����*�]C�Q�&&�z���?������P���l�H���]x�I�^}�I��L�#����3�s����
����pG�p�?n������,c������.f}�����m���y+��D�^����~n�����+�"\�=���jVj��<���f��3���#��6�Mp���	F���I��=��>g?{�N�h����Fr��;���WY�Q����3�G�M�++�Z���5����n7�ny�1��[����a���#n��{�`Ph?�
�����+������~��n���1
��J!t���So�����|?���&M���|����P<,��a ��O?��O��G�p�2"�X%�g/
�/����Y��w���o��Wcl�K!�F�G\�}�{3�r��r��'�=?��Bq���Qf������y��
X	J��&[���+���-c�Nq�/��������o�)&�e����]v����������V�m�]��~���F�C���>.����t����v]��fv
j�}v�_�C��e��wX[Vi�������=j��6�(��Eg��5W��s�g�5d�����jP���Z�k�=T��`6���FF�:���|���4n�4��G�pG`�#POW_S�?YO�pD`�Xx�rd�-���,���N%Y��+R:��'��#J���e���`U������Y��,��N���;d�`�$P��nv5`�WG�pG`~"�{��'���#��#��j��Z���
���K��	X.B��_6����e�����{�K����w��#�8��#P����1�G��q���U�g�pG�� PO7��
b�l/�#�8��#�8��#��7�Q^�=uG�pG�pG��C8V�*���8��#�8��#�8�E������;��#�8��#�8�@B��:T�^TG�pG�pG�(/�������N:t� �:u�?�|#'G�pG�pG����9`e��?��C�L�b9������O�pG�pG�p�����s���iS������~��9|�kG�pG�pG��-�9`��&���#�8��#�8��#P�p	X�h������[���+-�[��X��#�8��#�8��#P����!����_~)-[�40c
6���]G�pG�pG`�G�E.e��z��E�����~�8��#�8��#�8�
�������8��#�8��#�8�@MF�%`5�v<o��#�8��#�8��#P�p�VU��pG�pG�pj2�������9��#�8��#�8�@�B��ZU�^G�pG�pG����9`5�v<o!���V$KeI��>��c���?�U�V��A�(O>��<��s���z�I�>}"��r3~�xi���t��9*2X�{������*��5���r��o���W^)�����m�J|�a?�����;��K,!C���R����G����/���/i��#�9���H�R�R�~��$������@2�������q�J�����1:��#������]8?m����.��$�}���d�UW���[��������(��~�I>��c����-7��~�\z��y�����c��0i�7�xCN:�$�����yx��W�������_~�a��S\��/hC�������>�1�����N���>���Z���1B}��Z��g������^XX��;�	&��M7����/�?�ps��������m+�Y�f��Jr����e����3fT*�B��JE<���1��I9�@-@��ZP�u�7�t���3G<��4~��w�P��Lx`5t�����R���\�Zj�(�;� W\q�t��-zW�n&O�,�=�X�"]u�Ur��G���Q�Q�F��/��#���Mk����}�}���I������^y����DY)���o.C��������_5f�'H�������
8�n&>�����K���K�{�w+V�H���Kq--�E�����m���������������{�I�O�m&3�,���O�^��u����V�={�f��A<��Y���;w��������������b��{�W��)��c����N�9��p�I"A@�'D�' }�?�9r��<y���w�|�"D��
&&��{�����3���S��
|t�`������S���)Z�hT�ti��In7���sT�zu�O�d�h��AV}��P�Blj�%KZ�|��Z����S�n�8�h��F���
���d&L�N�:���/�	x�����{�2���3g��u����7o6�O�2��W�N��������\��+��g��aVE�.�#F��2�r��#��O�'O&��� |C��	�'�q�[�.]Z;M�7��9|L������v�_��;�Kwh��q�=�iw�����\�jU�������u4������q�*W����v�khz4a|�/������|��mN�
WhQ�����V�pC_�]�;���������gh���X���G��l�}�8���B�	�%J�;6���_G���R�J����*�������v�Z#9�Y�hQ~�8B�`hr6�]�'���'Nhv��Q#B>M���A�&H��O�6�'��w��9sm#�0��d���/���  |v���P!p��-l���A�Z�b��`QB�EM����I�e����Ma\���L�,j���~�]6V���$M����3Y�$�a��
ZZ�je��P���Y���cQB
�W���q���hQx��������9^����Z%�(5��|���r��jO�E�b�~��K�r��g�c���?��A�����9^-X�����Y3����=��e���EM�-�|���gO������S\�2m2;�WL.O	x����V�9�9<}�����(�?�. W�s"7��/���0����X>l���oy<�t�p��-%�3�����1���1���1/���xw���r�(`Q���}�+3p�O	��'MJ�f^�<u��y�_-p���_�x���'J�cQf����q�~D}��1���\��LY�>��(Q�J��y]�S����KJ���� %X���������c<�Q�fku��/�Ei�����oU7D�)S�R�J�t�0Q&��>@;�`l$G�`,"��U��_._�l�;������o�>��;J ���w�r�����}R��FZ��(�X~�)��E|W�����)+���V�� ��@@4`�W(����c�a;��T���2�KJ0�=����@;�UXL|`n�m�6�0��l�5{]��-������
�	A���pLQ�pa��NMlX��g%�P���	��J ������B�G����H	���C����X�b����X�����Q�E���2e�
3"�p1bD.x�t������������Y��������]�n��p.�=;�fc�hu�3%��Ff�j��cNM,���{��+��
�&�\O�587��,��E0����o�B�
��MC[���)�>Ihka~
�/h�Aj��W�'�������=O W�Bs�w�h^S�J�a���;��)E�������FY��[�a6�{��W��T����[W��wb��%�D�F���+h���<yr#��xv�=a0�s���n�!k��\�?�W�^��I�U�k���6z������&{�:N���  |)D�RH�)'r��lS��d'��A`���s[�u�b"�	�Ge���	�svUR+���8Lz����,F���=�+�8����%�01��H�����y��AhE����D�����4%L���{@4�/y5ObQwm*�
w��fMp�&�I����8�fMX@}0����i1���� aQ��f�������cB�pa&���8!)-+A��B�n�����+��0!T�D��}{����F�#A���{�0b����[�Q_�Y�0���T��;�E�a�XD2����l<���*�����\;�A�!h�����=w�S����C��U�/���R��(I������e^".�} d?�������k�@�	�"�GV��F`n����]A��O{v��ms^G��7���t���W��q��*3�i�r��0�������>w+��@�����o�K�,��8�j��{�����]nC��������f����1�����a��1cF��������� ��+W.��/���7or���I��c]�wu�������3>�`�@��L=y���������B�@�:k�,���C��t�+�|��,�a���V����]�����@���Kk������{///~���7��K�2�c��hP��c�SB�A���{�b1���x�Ah�\��_���j<��;zO�;��#h��;��i�[|�$Ck� `B�
Bz��{��]����W����NP0~�.K�=��sh���{�$L����h�>'�.x��f`��e����`�W0m��0��fo��
��]c��q�C�U�Z��3h���Ul�I�;��V{x�
�|��ln�I&2�S�kw���aUn���������3�g���l��g��C��a��
�}�r�����\�}��`�h��L�	�����a�Z?�-����Y������&p�����HS-��<��`�
�NG�W�d�CZ=����M��(&�������	�����\��X���k�'r3��q�	�4x�\�}���.!�i�Y[��P����R�����H�<�abM	�h��� 8��w	����Z��}�SO\pF'����0]��1���@6���A�Jx"����i��/�� m�}��t�W�s�/��������%,HA��7F��������r����2����z3��p���$Ib����3����Gk��<�C�1�7,�A�6x�`^���/��o�#�.��gC��3�" ����c��~ ��`3�IM�-������
�kz���d�L�>Q��m(���lj�mQ�.��������(����y�
���F���8�]��	2Gk'p���+aNguy��s8A^8���}��k�S�$;��=z���G�|,j�Q��Hqr���d��sZ�j�����m]�[W8k�(C��q5A�g8���v�Q�&W��t��p������0���d���.��,�	$�c���3���J�~]n�|p���\�]	G���S�85��r���4��X�bF���D�e�qVw8x0�����o&8�@�5p�r��vB�����<�L�~%��S�_	HF��i�N8|�;��p.�	�P8)�([��a�0��+�%8���q��D	�>��}��/��M��3�9zO N������	x��*�����#�h�l�i���o���33���a�#���  |.B��z�		A���`Rh����9�
h@R�H�&h�)��q>Vy��g����R!{��9\��-4���{�'�({J��r��y�/(�w��_���A{`;��	D�0��GX�����>��Ywh�Po��0��U�����qUw��^Km�g������

���h[wg<u4+�3����.�
�m%�,����
~����7�������c���fK���O1�`.���K�v|��H9�� �,��a�jL�0�R�����f,�5N(���3�@YY�� ���o�`���� @��^��fw���xsAk�eB���B^���b���O�	:�����RSA@���~��� j4�*��  ��?"P"{�&�A 8  ^�C/JA �#���K�t�������@A � ���S�~����@R%��  QD�c�=|p��\<Mo^{���#;�.N�:}���c���7j���4ax'��pf�������t�AZ�0s8��G��yt��j����9��|�lK�e~�`��{����}������i�=#��A����.G������������7�G�a���i�������t��������7??}��n��d�{�~�n�'������
���4������O:��G�m�w���S��F�5���5{i��x��5���n^����U_! ��`#�z�����j|���V�Je�G�'���dL�=y��
���*��i	�\l}�����������n
�ZPe/�����?'�X�=���	���x����^>��3FP�&���oS��nc<���;7S�\����:���w�l5h��p����Q�)G7o�sx#L���?�2x;���c���k�O�]��8[mB�zyhd�o�e��|��\4�U,�J�K���/�?���)�k��kQ��5*d���wm�5�o\�'��PG�"~��_�����*��zu�����*��]����i�N����:eSR�2)�^����ZC(�U:���y��4w��{�����Cz6�*��Q�
i�a���{��o^������H�j��}H����� ��@IDAT�^��I�������L#��������������K���u�*g8�C���z#��W��&Eh��y�����{���0��x�m�n��
4kJ�$�,[���g���  �A@�p�5�n~�TjDC~�N"E��K~%�x����r�-�K��-0 p��U�$�[�)����x��f�����3A�=F,�W�L�T�b�o���5���9q��m\=���kG	��[�_]�}��hF��
�h�"Mf�|��I��[W(r�h=���������s�J�:��c��%�������J����0��F���_G��"�<^�zI�����]GR��U8o��I�n&���b|������B��^��o���$yZf�������I����+6�fm���C{���@�p1��e|�A�����<�5k�O�>��v>�7�_s�86��~��3���
��LO�W���_��5h��)Y����`~��Bk�f��n����E
*4��y)�x���
A�z��
��4I�.�P����8~��	��@��w�z ����� |j��]zO����������Q]i���	���0��Pd���i8��8ij�i�z��.��m&��\����J���Y���B���{�7�_�o�4���p���u��#{([�"n�	i���&X��")^f�{Bd�V�)n���(�A�m��N.	5i�g������^�u�v��+J3�.=y���h���������-D�{��.c6mBo^�r�/��u��/�;�5���1��mX�n\�`��h�h^=�
:Vpwm_�q��A����.��{|�On�]�l*aUy�����La���V��U[���-F~W7gN��<��g�QGL�@���Z<����1	8�����c�s~�c�fF����+��6H�JM�5a����d��H�g`>aD'���s��Q|����o������	[B�����'�W�;~�_����+��O��8�����|?;5+�[�����Z���;�������;Ox���S���+/���u�i���y}KMH�g���GZ����;���[X���&�IxD��-Fl��;>G'M�����L��q�{v|"E��Y��bB���5����������Z�����FX�q��ja
	�)0K�%���t��3��S������_�3\gMH�k����3V	����s�x	�m�12������2�]��o<�G��?R�X�X�o�&i��9���Z��T�[l�{j����x���Y�R���9�?w)N����T�zT�zs����of�a��eMS�l��]��\�c�vU�+�����)�i��4o�P���x��-(g�J�\m�swL��\���BV����	�N�%�M�ED�A��s�����n��rq;�{��������r�V)�������lv3nh{�:b�����{Q�n�(S�j�����:dL`>7��>4j�zJ�!;]S�P��,E::n�S�]�E��� SL:���'��;�Wc
>��v����=wQ�E�~�T��w���g�>vuwQ�����^�t&����:�	�o�&q�sWya��6�:���&Hc���4�������W/����yE�QxU��]Fp<� �E>��B�p��kA���FiB�o[�0���&] �e��b�M�&��x�������_^�d�����e�ash�n�(M����wgc�@K�X���k\��wC��e�0JC��U����n�O��|��"��\MaZ���Z<{�B�q��EJ���	�ejQ��i�2!�v��0�R��+M�[������A#� �F������
-��N�i4�L������4ks~D#&~�h��~�V�����}9�"B(���L�p���%G�b���vVi�r�+�~/\�|c����[BC�HQ�"�^���NS�d������J��Y�u}��5=����x�7A��U�%�(kN��[��2d��Z����"h���,�U���{{<l���`�Z��{����#k���HO������n�7������3\�����8F���Z�����<0m��*(�H_��������a&�m�R�� `{wnb�z��'��RBdK���5�2��*���"R�di�O��V��5:�V��J�&���x��GS���Q�����o���3Vv�Wk���X��KYs����N��
fq�e�Y��`B�iP���h����B�
�x�$t��n��z�!�q���U�����#V��>|D�������S����{��$�����Y��
Z�3\���mG��/���	��~�B84F������r���1�T�z�)��I��s'X��XfS���(���?�M����Ji��hn���RML�"�-P�"��z\�C�m����:|?���?���O�7�'����	�F|� `[�$���T_�������B��!���X�����4p��%L5�^�9d&�����������s��N��y d��aQ'T(��,w�;KsYh�!b��}e��y�F���xk�!���?aR���]��V����y�E��G��lI~�m�gA@p��`��q3�m0O�V��w������6Kq'����@���V��e���3����s	S��d��������''�qo�m4�T�i�!�>��?��"<}�@Mf�����U����X%�V��%HB��o7��*o�S�^a�	a"���3��!(m�:��x��
���m�V0���M������3G���xR��
��t�;����������oq
.<_���I�����4�E��}���J�u��A��>����@	�/)��l��5�\�?hi!�@	S<�Y����R�o~0���M����U���_Q���#��g<0M-[�1�����*[����W��E��A������1kL�KW��W.�6��Z�
a�$���u�2�2R���F����6C��h~���dW�*�����i�g�N�$7���c#7&�
�hSde�
A
�H�X�����l�����AS�i�_k<����b�3��eO���tg�I�  ��=`f4<��$����x/6�WH�a�
x��!B�#�?M8 ������a��	�"3�l�U�a�~�}��h� ������p`�'���x��X?v�f���$v�`��&�~��g�z�zcb�	'���M�M���JS�v�PZ+V�5a�"x�N��Mx��
��4w������������)/�SX��Ei����UY����1G �Q_����;��U���p����C�x�`�����������-�Y���1[0�<w�y)S����Lr����4�?h�1���o�qvv�y,41���t�q�������=�>������vB�W8K��R�isU���0��"�1�K��\Z�����O�c�y��chR1f�N���&=�$J��xk{��}��	�a��3S.�D�������C��I���O�������wi���%Q)���c�"�:>�~l����7N��x�=�*M��88���B��  8B@0G����/h$j6���;�?j�8%������w5���	��R������,���G��Gs�J{Gp��^Zf�.�n�|P�����<!~��{<�?{����>w���a�y�-a"y��.���������X�p��	"�����j6~��Y�|�k�������f}LF�UG(�lq��+x���������}f���mz�,�r+����TZx�[��g��Q+���
1��U���L����1��n��bOLO����������U�1��7�v,�N��Yxz�v�� X ��'�/���Q9Bi0����=��'A�_���X��2Lf����l	X�|�j�0M4M`����Q&p�{N`��{�g�[���-\���^8Y��m-����.��Ph����Z#f��-;
�:oR���
���8���XT��V�'���e�|6[F9xO@���=|�xG�;<A)R~23uU�#������	�;!����;��x��c<ca�]�u�����q��UV��������I|�(�!A@! &���q~F���>����Y��V.qu��������������qo��b�����9"\�dTN�'a_��@7����'�gz��	�?��7n����
�f~�8�G����N:�*�3a������+RV�C�f���9`�"*U��f�����K��u�O�N�B�@�a��b�]���C��mV��������&�� ���T�@M����5[����AE�M�+�^&m
����f�]�����.4o���\;4)���y���o�n��L����oqwgL��������'p�/�p��1�����q��&L����"�����cO�c5��VBOH�O�����}�oT[M���n����&�5�� �����_�o{��?�fQva����&|��a�}�wd���r��a�ls2_�C������kh��
�[�=o�mO��W��|�(�^;��s���|����v���9���"�u��2�����0���Z�:�}�G����;b�*#���Ki�����0b���r<��������rux��<&���@�����P8$����2��jq
�v��CX�l�x�����4q����e��}2{�p:}�~#9����V��XT��3��k�r���@��G�Y�~3�!����:9�i�M�0�d���fZ�_�����_2�@s����j���SB��t�:E���o�����P��T���hB��\O*�eC�1bz�%���> q
c�Bc�Y����uw��l��f�=��V@����'�Vu�8��9{f��7����	��o�(�P,���QP!������?���
$���b�G0��T"{}+WA@���@���s#`�Y��j��b��c&��^�&%�
_sb�,,]��,VQ?:���3V�&4Z+d/��	��t���W�|���:�[����nm&&�.�����u��u�&������d`w����;�V�&�GX|r��z�z��e:�zRw�K5{bt�3��������U�&�j�)�U]�#������8G^��n��c�pb��g�2�,9
�C��A$�Y�e���R�����:�w'g���	�/I_�m�mO`��o��e\s������q�]�=�_�7����n�9�$0�S�	��P�TKL�EWJ#A@A@A �# ^{I�-W�\�{������J>}����;�_���  ��  �@ C@�@�!R��F������	h���g�����f�J'N��n�X@�K�.]�x�S��	��  ��  D6]�R�re�W�m������>��s�1c�J�*�(Q�_��:u����c���0A@A@����R�}�6%L��<H�r}:��C6�J���+J�"IM�4�g���+^�8]�~�'N�,��	��  ��  1D������#���G�g�.kt�~��cV�pa�t����f��,X@Y�d��q���O��������i��y��4i�$��!E����9y����;w�X1���m��=��4}�t�(~�]�6�8p�n|��9|���v�%PA@A �" ��`��}����v���LB����r�J:{�,U�X�f���Z��1cr�={��R={����g�`3��7o����=�7o;�6l%I���^�����)N�<�w��%���+���[���;v�����3�����������2kB��  ��  ���/�E�F�'O��	&kt���i��!��R�LIfAh��U\G�AH;f�Z�d	m���3sK���Ki�:>����|�S�&G}V�V-����k�f�0���sV6������A@A@����`�
Z��#G�L:tp�PR����k�(_�|,|!S��IY C���ar&����C��p����
��+��[�8�F�9�2�!|iSD#PnA@A@�<�,������f���	��tD�	������w�\�#�L�c�6?zt��e Gf�^^^�a����=��$�o��a/�`�
K]��D(��  �� $,�
fh��W��8����(S�;��:u*A��	"�?��g�������g������K	.�9����7�;r���4+R��=�&��  �� �,;/T�PF��{#Pn"���W��B�
��_?j��I��p{�4!���[:��k�h���4r�H����wK���#xP�j�tZxt������:�\A@A@�rX��3���|��=&O�����r���$� ���/o�����x��}�5u������;wZ������)S�Lr�d$@A@����O��]���@�H��S�������V���fQ�D��'��P����h��4u�?�����K�  ��  �@�F@4`A������  ��  �� ��=`A������  ��  �� �,h�_����3gh��M]��x�A@A@�@��`��;�2� W�3f��+���G=�*iA@A@�A@���R� ��W_Q�1h���.\8�`)<A@A@����h�>+���s"������/���Yx��  ��  �+"��+��1��#G��'�{������^�z��
��"��  ��  f�����o�{�>|��5	zE�8q������B��  ��  �@PA@���h���������`�u��z�8�w��T�\�@]_��  ��  �� `F@�p����@���7o����,X@#G����OS�}�����  ��  �,�����i���\�*U�P���%���x��=%N��@:tp7��A@A@1A@�=zD��W�@>|�&h
���	��'O�V�����  ��  !Q�`��
�(�|o��S���K�&���4�D
��  ��  ���'�����9H���h��-[��c����s���  ��  ���C@��}9������R�v�%I������J���  ��  ���! ��a)�A@A@A@p��� :�G"A@A@A@�?D�?,��  ��  ��  N�)<)8GT>|��<��+p����S�j��/h��	t�����D3f����/�"����		��  ���{K�>+
6���+�2<a���Q�v�<�"i=DG�����\��=��]������J�9�n���?�9������S�X�����~�#�A@����`����~��'N��%K�_�~>��}�vz������p��U��3��.�?p!��x��_�Ro���M�69-��������CQ�Dq�V"A@����`��p��#G��'�{�.k"E��#FP�2e�p������:t�.�wV��7X�=zt����
"rf:u�U�^�����k&�G�����S'N�q���a����=���;S�4i�������sF��w��f��?�� A2d�3�Vq���+e�����,Y2����G�`��B �F������\���x�n(g���F���7-Z���^�t����)�h����w��-F��73g�d����X�f
m�u�2e�������_��hC��u���M�F��7�I�&Q���0{��8��7�|��D�����(��l�{��I\�j$�s������+)t��n���O/_��������^A@��=,B���[�,jT�G�P-�X� p��I��;v�K�*eQ�E���d����&��y��Y���%���\�x���O?Y�f���Y���q��m��&��h��Y��g��u+�Q�7�2�����E	S5y�<��u���I�&�9r���N��C�������t�\%�Y�����,�/�E��*�3�l,W�\�(a�1���\��	���w	�(��g����d�w�����{s������,���E�Y��?�i����g��-J������o�&���;��C��S��(-���������������>��9s8~��uF�_�e�PB�e���\5
<���)�}�k�.�Du4���J�3�qS�Z5K���-�X.X��E	�Fg����(a�1�8q"��Y	{��wC	aF.��q9|�p#LnA@���VZ�L�'O����$��#�z�@�&M,���	�NL"1����W�:u,J;eN�������44<�L�A(�A��Z������`��a���[{I\�9����r�&�B�K���>@���V�dI4M�No����=>�)
LJ�}3r�H#-�P���l������$^���6�cB�&�39��!m�p���?���	���=�����YS9���Z�2pA;Q���������,�a�+`��1���9��yg���e>|��,���  �@�E@L�� �(r��S5|��	P��r=D���3l2��
��1b���e�R�)8��*T�@3f�J��AMX�t�\�r�&�g x�)-�I,)m
?{��{����G���cs/5�u�����y���Qn6�C���={R��		f�JC0���v���M�t>�Y�o`j�?�j���q�r��+WR����m���@�J� x�S2��-�fG�r�2���1���3�aZ
�Q�	���EJ�f�0��%�-G��X�6mZG�v�����������W��qc6M4g���9��}��)y��8�8�
0�����{�&1c���rA@�7�V�
S�������&zA��RE ��Y3�������`>�+VX�����g0/sE��;wn�rk�I�
Ro67D�6/T���R��8����q�:�7L��!Wu3���CM�Y�
�+��u1��+V���\���ZSsB[B���qD�3�;j�4r ���������cGn���F� ���/F�*U�X����0QE�o����j�?+�Fz=t}���7�/L]�1k�4���#r���<�6f�j1��/���#�74�Z��4n��
	��  !����A@�"`�?>jBP��r=@�/5a�U{�l��wg7��-=4U���� F��'�M��9U�T��p�r���U�J�*�3�8@�m4N�j��7/�.]���0B��I	8�/|����R_���xgWWu����k���*I�$�� w�u�x�8-42�z-\���`��4h��p�I���A;�% 8�P{���4A8/�L�O�_�:Djo�Es�[��n�O���O�~A�	���������C�����2v������b|*�������j���~.��s����������2�1c���_�}�����������Pa ��  B����W��X��IQ`����hS��qEp������N!���n��������
��i�qSB��
;��V��L0#��4o\�f>��mn\�
NCt��
�����J��pX��s�<�K���\3V���S������{��a��G��'h�������������C������
m'4�:���UhR�09^������M:�
�?4o���@pz������Q�+\u:gW8�����A�E�eFE�A@B��H��+$.�v ]�t��P8\��}4�1�/�=��H	����	���%h������SrV7%L0o�'Q�D��&hD�����<��Fhx��Gn7���mm
\���)l��v�!/��9��?b���=r�W��J���6��<�=�,���m�'E[�yl���]m�'�$�  �� �,����7���&�JC`u������ ����|	�/8`&mA@>�|�RZ"��  ��  ���}��(�A@N��.�n������  x���&��JjA@�O���7/�6}���x���_9�9����  �@�C@0?v��w�������'W�~d)�1p4���{�8�������/�X���d/_<��N&���0~W,���� U�/ `�9������;���d?�w��n��t������Q�*�����  ��  ���`ne����3��N.��UbjZ=+���V,�h�L��m�R�"��b�X���.�u����{w��w-JS��q�q�LT�x|��s�?e��T�PLjV#U-��Z��g������i���t������nu��w5?K��^�����v�)��'����S��-����O$J��J��MK���(w��  �� �����8^��TjD���\��-{�������g��`�@���i�F/4f	y�:D]����Y�vumY���q���G4`�Bu����c#-�?�FMY�qK6���L�C#�I�C��&��bm���#�^�+��{���)	��  A�|�{i�g�����$��R�x��n��������KO?`-� ���s7��`�F�IX��_�5i������1����t��Y�a�4�"G���*R�ru��;�\M7��'�%L���5�f�����I����9�R��i��+8����vlV��y�1�k_�*����w9��eS9��)��&��J�li���:k��<��lDO�<��|��k3���f�H�c�nXh�o�� �o�U�3'�������h���s�m�Z<��T��m��MI:��U~g��<�7x��f���-0C��f����I��S����Qc��?����r�/i��7�a�2M���*��>��u\G�u����l�x7��4�����������*��-�Z>G�h�h�?�������X�8�;/���Y!���Q�Ve��(qh���2g/��'T}����uy\!A@A@ ��F�����S��E���^iL��=Fg��s�q�2��c ���[��<U��X����e�Y��kr��?4`�����V���<H����\���Y�Q���h��}�y��iMEJV���F���W�E%��g#�	&,u��-�r�z�M�6/��S�e\:��B�n���^L����M2�q�u�"j�},5o��fN���~��^��.Y�~]��'K��?����/�q!������-�K9rCv���\���=h����4y:��1���!�k��)r��4v�f*[�1�����B�;�UmB�����&[|Q�A�5.L�k�d�9]�������{wn�S�tYi�/)b��J�jo���/c`����z�<���Yw��C��)k�B4S	��Dg�}j~���w+M��5k��5���JQ��0X�����.����5�0�������u^\9y�S��F-{q;
�&l�p�,Ez�x��9����{�CHA@�o���w�la��V�M���2uf�F���������	�v>I�@S����gNX��0���MD����1����Z��ISS�^(_��4x�RZ<gAY��o~`-2��m�����._<Eq�'&t��[@�+xk��*���^G	��4�������L���>���b)Rg�sg>�OC�"_U�
��s���[���y�����w�<K�30L��mF}[vF�C��v�F��w`��L35m���Wk�A�����9c)k�����|�4�}��9-N�����}�;���V������z.���7�&"0�� �����6�)���*_����AG\��=�n�P��0�����	�a����������P�P��0��BC����]>��v+\�T�~{j������~_N�K6\x��s</]���;%�C�����v�����ZF_(�0�dQ����T}�q:<g�]��>O�R7~6����q�)q��t���=vM��#8�I��v!A@A@��c�N�J��3O(b��<��$��
x��H��zT�Y7��� ���+/AR��c�Q�"%��G�#���a���(S�?���FM�gM@�����s�;�(j�l�G�Y�X��:L�1�������o_���	��4
��O(�'���3����*mV�5e �+���J��;n�D��r��1K^c���B(A�-�-��N��<��%HB��o��T�n[��q}!4P��i2y3f���\�g�B�=�A#���I"���VK�8'��m���r���p]8��@��d�:V���Z��Jk�R����G�d�(IJ�&�����������F-z����#����6Z�!3i���1���z���Ty�c/<��>
����)�U��C�J
m��YA@����������p�����cr�.K����k'/�x�W���!��<)�da,�%�I��5b��B��u��	��JE��'�����;`�%��]�N�e�x\���P�8�	Z�'��B�+qZ���u���i������t��0Af���P������?M��A�A�-a���`��P�T�^��i������o\=O�Rf����
B+��`u��~�������#Pd%�viQJiz
ZE��O���{���3Gx?����Ti�p�/Z�O�0���w���f�����K�����%�����R�oi������C����e�|��(M��I�h����c�a��\�&>����4^��{F4�L0S���=������)m���_�^	A@B�lHBR�����C���~����^�3�=6z���"`�'
��w�i������c�U����	�����JoX5��l�V~��_i�@03D�����t��ph��3�(flo!���nXDG���-�$��������/�����&������\7��A�Z2�'N�3Oq��m+�l�NJ��9����
B�&|�2����Z��l`���]�pR�t~�>U���6���f���T/nh��r�-�	��A8�������Q���� �+��n�t63]���&�w\���^90;����A	���9y�?l^��/�9��Q!Y,��@Z�a�J�Lc!���	�@	�`�	�q^�{������{y$LA@�������?0��^�4a�;���E��"�4n��D�h&�����5�����fF���� �Gh��x�
K�4 ��u��+�:�{�C4GC����Yk�g&�.B[�����X��L�N��)�)�qB��m��>��P&���%L���^��a��qh2��a���������>	|�	-�&L����d��1S71.�#�5WB�N^�A+3������;��O{�~4]����_�oz�p��K
����r���+���.Ccdk����2���>:��;�.{�4V:Wh���f������/8���TZzx�����='�#�N����Ux���=}�O
/7x�i��C����e.���n��$��  �@�C ��#�,��U� A ���g�IC���&��2�i���7�4��(�-v��V�]5�`���
���Rz�(�L
���	��������7)�2����)��{�`�iv`�h��B��l�A����K�0�m�3\�C��N(�}�s������z��/Qws;�������y�I�a�>���`o��t��i>�g!��b��u=�c��i*�=����  �� ��|� ��#�����.4%�&���&A�� ���e�'|���(��1���o)\�����	���9#�
�(|K���c�����n�
}��;�W8���n���#�:�
���Hs��;oi���:�\A@���`!~!
��)$�3�l���|"�G�-W��h��h	A@B$b�"�]-�@�! &����,�� ����:��)5B�kOB�@PFg�YQHA@>! �',|u�M������B!8��G>88�G>|`w����t�K�(l���4ax'G�n`�0s���f��1�?���h7�p�RG�=u��5X6���n^�������  �@PC@0_���Kg�e5<|5������J+N�%7���y��4��Vgn���������}��`��8
{��9p�m���c%����*/�!������������=���^������HH=;T����� ��{=����5����������t]4k4?��O<�e����t�8g�$NA@Q����~��
����@���-{�������|�Q�%R�3���<J�����vl���4���4b�j*_��GM���=8��P��v�Y,!�������n��(�����/�[7,�qC�S��h���h��s�<UF�������g>l\��6��k�M���tp{�i��_������0A@�:"���q�l�����k����+s��4c����X���~>
J*w��A;��aA���R4���V0x��A��������a��	���;�/\V����YI}:W��c�s���xm�dc��e��������s��\�3'�����$�mR5���O��6x�<��)���0�'h�]�����Fi��e����<����C��z��>�k#�Y�!�.�;�y������o�C�A8`B2T"E����;|?���4h���+Ofl�����n��a��>��|��1��}����|�z7!�T�=���5mBo��Q�Cj��l�����=�3�2f���8���S�N�
�w������xD?�nh7����&����PG��C}2�`b��#�>��!V�*��2��g��w�a&
:��A@"��i���9e�]�m��������=X�
��W�`�>�4>}���d����VP�����?���e+7��a���S��AbL�!�n���^L����M��s��Y�e�A�����c�>P�vm�5k��N�Le�4���;�& �������D���E�R����gO���{H�:#e�Q�f*�|��,��n�1�}r�:V�J���������d���0Wu�:}�����L��������[}�}�pA,Gno�����jn^�?��y��8�F���b���A���sg�0���CW���M�q�������x��q>�88������}�M�A4o�P���&�\�&����2���	�%���Zv��&OGC{y�7�O��uZS��U	f��^E��=�l�,T��R�z���gOT���9�@x�4mu�9���]��q��|����4�+�����-��K�������0���  �� ��s�����]����Z�7�R�����I�S��z�0�i;�.D�+8��i��>��q8������0q
��0��t�0i�cg�)G��Y��a[*�UU�P�9W���vl[M���J+v|�=l����'L�[uN��Bk�i�\
-�������'4E ��bk�O3�>�����@5���&�����~_N������`L����A��6w�9��W������S������������ug�\����G
kP����������Bo��Y�~�)[~��z�m:������c�:vh7c�������v����PN�m�R:���9���(�
G}��MK7_�>��KYsf���K���6�������L�������&}v��:M�KT�J5�����Z���c!Z-��<J���8>���W�335U��Mm3d�c�����R���|�K�  �� ������N�JP����I��q�9&�B��F �J ���uU���K��'��6���g�e�J��fMH/_<�	��(Y��!���{w����L�IQ�����m�z��X�k�.*��K%��4�fJ�$%k��$O�4K�k����2��Z/3OWu��b�R�����q2+m���+]�`�	�����A��3?k��.cNR�f�:��}�)�����60h�����	%��X�z��5���n�W������.��}��o�	��.�����xh���/�&|@�a�iKy{������6E�arA@B*"����!]���IDAT|uQ{�����xBo��d��-�����i~O���AW/�uXah�&��d�fN�'��~��~�N��E��������o\=O�Rz�?^P�kZp��[P�R.���������6�x��f0[�s����Z�q9y-E���%T��;�����3��;��TZ%M�i�)
`�9�	``�-�O�b���� �h��@�e�>��|�Qc����h�h�|%9�Z��L01��0a�r9��v�;�6�n���#�l�S�/�2�A���� ���������:#��������J��^��	��  !�$���cr�f>5vRg���=8zE�q�O1p��
����O��;h50��������Z�E��>����������wP3L	�=�|���3������Vu��(�����@������Q���E���|7�
������`v��O=(�,�;��#���dL���o�9
40�NhKX�� A��������V�������{i��Kt�+| ������U��c�o3�L��_��2�<�O8�X�d�m��y���S�������0��J5[���bOhlX9�M2���4X�q	p�`�
�����g�b�1O���R8����`
SI!A@A@'�g�&~��`B/`�kf����S��i2������IJ�B`���T&od%|�,�U�)}���>-W�)�S��mk���K�O�>�'�����p��(M�jdS��T�n�4����W�8	h���X��Y:)�9�������-p0�3��Q9��b�V��<��{�������O'����=���V�x|�R,�����N�k]u6�q^W���{�\�&}�fN�O��-O�>z���'{���3�&��D����q�b���?�3G^){
�MO?�&��(l�Q�-j�y�1A{�3���L��le
�m@�i\%uoS����N������^C�� k������zw��g�3�X(�-�.#�>Sn���������Y4p���vx���5*D�n\6X�zvn�g�isO;I$HA@Q��~�]�;T(Du�46("�j�/�Ey���h���oX)=Ota��	p��$\-�����{�-�T���D�IP>�&c��6�s������� ,�:�'-���kh1��=o�����yc8�
B��z�;[!yj��f-?��L�^���sf�=f���'<��9�h�aJ7~����8�g%j�F��Dvq:�I�HZA@�����^�)�	&@s�u��j
�Tfg
�0����/3����q~���)a���b���0�-�U5�6�9-Wf��+��Q�`&H��G��f�6U[��6���h��s��j���l�@�����
�P;�����A@�"��^�6; �x:!����5|��
J�U��=�|�����n~��Y�����5)J�h��4q����Z�@�<��  �� �+��W�I&A@p17s)I'��  !��zY�(��  ��  ����
pv���p-$|N�^�i�I��9; : �%e
��  �� ���=�3s�~n����l���b�D_r�l��kv�C�7���:�gH����<�;��q!A@A@�# �/�{��
���e�b�q��4yt7:r�_r�l��  ��  �� ���=�&}6���{v���u�veNW�f�]�YJ������p��� �
����su�X8�oR��N��2fPkZ6<��������o�y��	M��)������?�g�;�-�������N8�L��e$�xs��>��_a5�{]�gN�g.��  ��  ���O#���m�)[�"ns|�g=��=u�p�� �
��G�Rr�R}����uy�I	Mf�z�,�2�z����?j!�<�p>hD��������/4j�z6'����Fv,$���n>����?v�=;6��f�����qa�\�%�,[[����q�/_>�
�A@A@o�0	��x��A����)e��ns�%u���o,L�0n���!�V���P��M���A+f>(7V���{�
*����Wh��C����g����mX9��/b�(����S�3's�����s��P�J�����?6.�J�+U���o���G��>�$�^!A@A@����B���S�<m���Gy#F�L5���(�$�d�Y��h�L���8�&��T�\+����B����)����`�F���uY�z�,e�]���K��_��������,�?��]�L���O�	��  ��  �G@0���
������p�����
�3�N����k�UH��4�[��D�
�^��\c��Uo��i�g�jc�h����;N+R�
����G��u��������k;nBz��d�O�����9����(EYr���[�������)m���F!A@A@k<�����'�tA��� 	�l�I�v�.�?��`�S������8�p��������v�4��&�\2�M^�'L��DIS�gk�M%I>���W�n�5�{wn�c��M��7��<R���#oq6[���2Q<a���?��^��	�!A@A@���`>1q+�d�����Yx2l^3;V/��V~Nd��c���>I������FQ�����?VR�?YA:t��H��ma_�'���U�H�����s�-A��6��
�S�	���#O���2y�+k�M*P�<��R�~�J<zp�o�{�t�\A@A@�Bm?��"@�@�A3���Bq�'��j����=_����8��!��������;�3�9��{8���{h�&����S"�8�1��A@A@���?����Y	��u��D�N��#���-]��E���6���[�OA@�`��h��}K��E@4`���.��  ��P�9�1p���F<A`�!ORKZA@A@q���K���  ��  �� ��z�������(�Ja�K�[,do"���y��K���#_�D�	��  ��  8C@0g���;��%%[s���>A��������!����c���[O�
�a���������M�����pg[U�F��m7���n!(��  �@0F@0_v.45v\��q����Y�k�x�n�5:��B��3�]:����$����G�@����������9FD���M��Mi7�_�����d����{����W���\A@A@v���.������
����E
G��D����9>x��rl:Cy������?O�����]g�g�G���������/�=��L��u�z���\��������|��7������P��^j�a��yFE��"��Ou�����"����
�^Z��d������e�x�yBs�E�������=��[�R��h�qk
�;u_s�1U���O��8
9~���o������68����P��7�F�>u���~���x	A@A (  �������7c��4��:��5�K�.8Y��-
t�1�x�N(���h��
�R��|�(F�0���,���'	}�0����)�h_��4X	 �*�iA�_*
P���S���&P��*~�"��.�)�j|����Q�>��C��T Nz���PeW��"��A	YY@�0�.~���6m\�R���JS6>w����>}E
��!L(�31�g���JN3��a�L���r!����{_�8�']��������^��  ��  |QD�%����
tG�KA;��TZ��4^����.4M�����y�P G���7-O�4q�>��D������(J��4Q	L��8G���
��������JS
�&4����D��L�kv�,	��G-��@��)=���3V��1s��J�5K�c����S�O+�?}��2|����0�w��U�B�a��lTUg	?��\/Wu�{CH�P�C	s������3w����WtM	R�K��de"�J�
�v�=Wu������|A������-���  ��  1D�e���8����}�1>O
��f\5�v�"��84BA�+��5/05k���M0M��jp��l��B	<�*���*���V^}L��^���"R��������VBOz%8A�A��'�N	��%��=�UB�������T�t*��"�������n����F�HL��Dg�O>�WJ���!T�g������7�^S��Q	!A@A@�2�KA�����)��g�/��w�|�C����.
���7�p�*Ib��
�$Tf}��J�h���|���jZ��J��4��}��<&�U�+3�U�"h��Q���9M�W��P�b������'�Gm��X���#G��MI#�#���=T���6���j/�[��0�����8e,�{#!�5������ ����A�w������Ak�~�{����cz�������<rx��3�"f>r/��  �@@"�?����*�`��<�M�&�0���&��@4H�����f�'n�`	S+!�H�(J�t_��z��lf���*�h
%��*����JL�w^d��\����������T��](��Z�z���@�5C	x�����/���W����\�9(�G�f��pY���s����?4N�@�S���Y�D�������6�7��>
��9��^y�N>�����:NA@A H! 0_v���U�����S�����j'��Y�e�w7����@C��pJj��
{(��7Q&�p�1Dy�'���2\�wS�������
��M������X-�^���N2�k��$�]9C���6J�u�Q��^��b��<�"+��o	�3-xi�p2)OR�0a�b�������Ap�T�w/�C�����5� 
��3?&=�� �����"=��4�C(�?�?����7���M��HCBT��^r�z2HB��ML�u��f�]wg�|��qv�>/[���r�!n�:��Q��^��?��=gz��z�u������������������/���p��@�6�d���^����1@@�����7���}1����2j��e�!42&�=_��������AW�4)��8n�[5�����t���P�Q���.�B_S�����.�������MQ � 03V����6��3���[+J�2Qt��EW�f1��x���A��I�u�����l�er�Ur��G@�,����Cl `����ch � ������^�@@@�dA�� �}�r�,���rqqq�>!� � �rV�\>@�7,���!�z]b���������	�[� � ���6�!5<�@�V���i6��h4�u�C@@`r&`��q�A���m��v�m0
�F@@`<&`�y9zv���D"![[[ruu�h�O��������$���L�@@�/��������D�������`$�kzeeE����J%&��>"F@|+�������J.��_��f�k��S7����X,J(�������@@f-@�YS��������K6����M	��3k��@@pB�	�������R��Wg2��cA����nW���
���{�e�� � ��Q��LFC�o��NGvwwm��---�c���oD"�1��t@@�	��b�^��]����;����f-L�F� � ��X38(kkk���
F������I*����uov��@@|)�0_��;]�VE3HF�Q��y{(�@�����F�~IEND�B`�
#41Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#40)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 20 Nov 2023, at 13:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:

2) Do we really need one separate lwlock tranche for each SLRU?

IMHO if we use the same lwlock tranche then the wait event will show
the same wait event name, right? And that would be confusing for the
user, whether we are waiting for Subtransaction or Multixact or
anything else. Is my understanding no correct here?

If we give to a user multiple GUCs to tweak, I think we should give a way to understand which GUC to tweak when they observe wait times.

Best regards, Andrey Borodin.

#42Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#41)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 20, 2023 at 2:37 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 20 Nov 2023, at 13:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:

2) Do we really need one separate lwlock tranche for each SLRU?

IMHO if we use the same lwlock tranche then the wait event will show
the same wait event name, right? And that would be confusing for the
user, whether we are waiting for Subtransaction or Multixact or
anything else. Is my understanding no correct here?

If we give to a user multiple GUCs to tweak, I think we should give a way to understand which GUC to tweak when they observe wait times.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#43Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#42)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Nov 20, 2023 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Nov 20, 2023 at 2:37 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 20 Nov 2023, at 13:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:

2) Do we really need one separate lwlock tranche for each SLRU?

IMHO if we use the same lwlock tranche then the wait event will show
the same wait event name, right? And that would be confusing for the
user, whether we are waiting for Subtransaction or Multixact or
anything else. Is my understanding no correct here?

If we give to a user multiple GUCs to tweak, I think we should give a way to understand which GUC to tweak when they observe wait times.

PFA, updated patch set, I have worked on review comments by Alvaro and
Andrey. So the only open comments are about clog group commit
testing, for that my question was as I sent in the previous email
exactly what part we are worried about in the coverage report?

The second point is, if we want to generate a group update we will
have to create the injection point after we hold the control lock so
that other processes go for group update and then for waking up the
waiting process who is holding the SLRU control lock in the exclusive
mode we would need to call a function ('test_injection_points_wake()')
to wake that up and for calling the function we would need to again
acquire the SLRU lock in read mode for visibility check in the catalog
for fetching the procedure row and now this wake up session will block
on control lock for the session which is waiting on injection point so
now it will create a deadlock. Maybe with bank-wise lock we can
create a lot of transaction so that these 2 falls in different banks
and then we can somehow test this, but then we will have to generate
16 * 4096 = 64k transaction so that the SLRU banks are different for
the transaction which inserted procedure row in system table from the
transaction in which we are trying to do the group commit

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v7-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v7-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From e11bda7a623728f281e921aff9ba93c5ca299b69 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v7 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9398afbcbd..bcbae61bb3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..8237b40aa6 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..9ba5ae6534 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,11 +493,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..c68d668514 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index b764ef6998..c1345dab98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e48c066a5b..364553a314 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v7-0003-Remove-the-centralized-control-lock-and-LRU-count.patchapplication/octet-stream; name=v7-0003-Remove-the-centralized-control-lock-and-LRU-count.patchDownload
From 57e54857cb815f479016dfbf95b796159368ed2c Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 14:42:25 +0530
Subject: [PATCH v7 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 122 ++++++++----
 src/backend/access/transam/commit_ts.c   |  43 ++--
 src/backend/access/transam/multixact.c   | 175 +++++++++++-----
 src/backend/access/transam/slru.c        | 244 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  33 +--
 src/include/access/slru.h                |  62 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 +--
 12 files changed, 597 insertions(+), 250 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 44008222da..18ec2a47b5 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -275,15 +275,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -301,17 +306,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -324,10 +329,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -346,7 +351,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -397,14 +403,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -418,6 +423,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -498,13 +505,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -512,10 +516,44 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	/* Remember head of list so we can perform wakeups after dropping lock. */
 	wakeidx = nextidx;
 
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group.  And if
+	 * there are multiple pages in the group which falls under different banks
+	 * then we will release this lock and acquire the new lock before accessing
+	 * the new page.  There is rare a possibility that there may be more than
+	 * one page in a group (for detail refer comment in above while loop) and
+	 * that it could be from a different bank, but we are safe since we will be
+	 * releasing the old lock before getting the new lock, so if the concurrent
+	 * updaters lock in opposite orders, there shouldn't be any deadlocks.
+	 */
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not in the same as
+		 * that of the last page then we need to release the lock on the
+		 * previous bank and acquire the lock on the bank for the page we
+		 * are going to update now.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -535,7 +573,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -564,10 +603,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -656,7 +696,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -690,8 +730,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -705,8 +745,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -715,7 +756,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -750,14 +791,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -768,8 +805,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -801,7 +839,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -833,6 +871,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -843,13 +882,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -987,16 +1027,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 96810959ab..9afa3beaa7 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -219,8 +219,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -230,13 +231,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -337,7 +338,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -527,9 +528,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -685,9 +685,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -714,12 +712,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -768,9 +767,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -802,6 +801,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -822,12 +822,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -981,16 +983,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1002,7 +1006,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 77511c6342..f204cb4db3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,14 +1910,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1895,8 +1952,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1905,9 +1964,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1916,7 +1976,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1976,10 +2036,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1994,7 +2056,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2016,13 +2078,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2047,13 +2111,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2068,7 +2132,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2076,18 +2142,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2099,7 +2164,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2114,10 +2181,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2405,6 +2471,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2415,13 +2482,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2454,15 +2522,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2760,7 +2830,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3242,31 +3312,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3332,7 +3404,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index b0d90a4bd2..902f76def7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -72,6 +72,21 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -93,34 +108,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -147,6 +134,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 /*
  * Initialization of shared memory
@@ -156,6 +144,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -165,6 +155,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -181,16 +173,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -202,18 +197,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -234,6 +227,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -245,7 +242,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -254,6 +251,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -307,7 +313,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -346,12 +352,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -402,10 +409,14 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -448,9 +459,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -459,7 +471,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -503,9 +515,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -529,8 +542,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -552,6 +565,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -580,7 +594,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -595,7 +609,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1039,13 +1053,14 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1081,7 +1096,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1103,7 +1118,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1177,6 +1192,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1187,10 +1203,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1204,7 +1233,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1244,6 +1273,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1254,25 +1284,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1302,10 +1345,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1346,15 +1391,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1388,7 +1447,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1630,6 +1689,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1646,3 +1736,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 923e706535..44c3969650 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -78,12 +78,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -101,7 +103,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -131,7 +133,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -194,8 +196,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -213,8 +216,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -223,7 +227,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -253,6 +257,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -260,23 +266,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -310,6 +340,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -321,12 +352,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 98449cbdde..4b5bb0ed16 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -268,9 +268,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -571,7 +572,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1403,7 +1404,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1413,9 +1414,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1439,6 +1438,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1484,6 +1488,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1510,7 +1525,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1989,9 +2004,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2011,7 +2026,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2052,7 +2067,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2205,7 +2220,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e4903c67ec..452b918181 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,8 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -847,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 8f20b66776..faa47698c4 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -21,6 +21,7 @@
  * SLRU bank size for slotno hash banks
  */
 #define SLRU_BANK_SIZE		16
+#define	SLRU_MAX_BANKLOCKS	128
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -62,8 +63,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -76,8 +75,35 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Locks to protect the in memory buffer slot access in SLRU bank.  If the
+	 * number of banks are <= SLRU_MAX_BANKLOCKS then there will be one lock
+	 * per bank otherwise each lock will protect multiple banks depends upon
+	 * the number of banks.
+	 */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -89,23 +115,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -154,11 +169,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int pageno)
+{
+	int			banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -187,4 +215,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9b48eb07c8 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

v7-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v7-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 00435c2c547e6121896863c4b94082b1fcc0d75b Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 10:24:41 +0530
Subject: [PATCH v7 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 45 ++++++++++++++++++++++----
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 13 +++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 138 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 8237b40aa6..44008222da 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1019,3 +1020,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9ba5ae6534..96810959ab 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1017,3 +1018,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..77511c6342 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3419,3 +3420,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..b0d90a4bd2 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -134,7 +135,6 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -258,7 +258,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +501,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1029,9 +1039,15 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1082,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1613,3 +1629,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..923e706535 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -373,3 +374,12 @@ SubTransPagePrecedes(int page1, int page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..98449cbdde 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -149,6 +149,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2444,3 +2445,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..e4903c67ec 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5011,3 +5012,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c1345dab98..8649b066a8 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..8f20b66776 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -139,6 +144,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -175,5 +186,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

#44Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#43)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Nov 21, 2023 at 2:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Nov 20, 2023 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Nov 20, 2023 at 2:37 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 20 Nov 2023, at 13:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:

2) Do we really need one separate lwlock tranche for each SLRU?

IMHO if we use the same lwlock tranche then the wait event will show
the same wait event name, right? And that would be confusing for the
user, whether we are waiting for Subtransaction or Multixact or
anything else. Is my understanding no correct here?

If we give to a user multiple GUCs to tweak, I think we should give a way to understand which GUC to tweak when they observe wait times.

PFA, updated patch set, I have worked on review comments by Alvaro and
Andrey. So the only open comments are about clog group commit
testing, for that my question was as I sent in the previous email
exactly what part we are worried about in the coverage report?

The second point is, if we want to generate a group update we will
have to create the injection point after we hold the control lock so
that other processes go for group update and then for waking up the
waiting process who is holding the SLRU control lock in the exclusive
mode we would need to call a function ('test_injection_points_wake()')
to wake that up and for calling the function we would need to again
acquire the SLRU lock in read mode for visibility check in the catalog
for fetching the procedure row and now this wake up session will block
on control lock for the session which is waiting on injection point so
now it will create a deadlock. Maybe with bank-wise lock we can
create a lot of transaction so that these 2 falls in different banks
and then we can somehow test this, but then we will have to generate
16 * 4096 = 64k transaction so that the SLRU banks are different for
the transaction which inserted procedure row in system table from the
transaction in which we are trying to do the group commit

I have attached a POC patch for testing the group update using the
injection point framework. This is just for testing the group update
part and is not yet a committable test. I have added a bunch of logs
in the code so that we can see what's going on with the group update.
From the below logs, we can see that multiple processes are getting
accumulated for the group update and the leader is updating their xid
status.

Note: With this testing, we have found a bug in the bank-wise
approach, basically we are clearing a procglobal->clogGroupFirst, even
before acquiring the bank lock that means in most of the cases there
will be a single process in each group as a group leader (I think this
is what Alvaro was pointing in his coverage report). I have added
this fix in this POC just for testing purposes but in my next version
I will add this fix to my proper patch version after a proper review
and a bit more testing.

here is the output after running the test
==============
2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl LOG:
procno 6 got the lock
2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl
STATEMENT: SELECT txid_current();
2023-11-23 05:55:29.406 UTC [93369] 003_clog_group_commit.pl LOG:
statement: SELECT test_injection_points_attach('ClogGroupCommit',
'wait');
2023-11-23 05:55:29.415 UTC [93371] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(1);
2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl LOG:
procno 4 got the lock
2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(1);
2023-11-23 05:55:29.424 UTC [93373] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(2);
2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl LOG:
procno 3 for xid 128742 added for group update
2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(2);
2023-11-23 05:55:29.431 UTC [93376] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(3);
2023-11-23 05:55:29.438 UTC [93378] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(4);
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG:
procno 2 for xid 128743 added for group update
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(3);
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG:
procno 2 is follower and wait for group leader to update commit status
of xid 128743
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(3);
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG:
procno 1 for xid 128744 added for group update
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(4);
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG:
procno 1 is follower and wait for group leader to update commit status
of xid 128744
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(4);
2023-11-23 05:55:29.445 UTC [93380] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(5);
2023-11-23 05:55:29.446 UTC [93380] 003_clog_group_commit.pl LOG:
procno 0 for xid 128745 added for group update
2023-11-23 05:55:29.446 UTC [93380] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(5);
2023-11-23 05:55:29.446 UTC [93380] 003_clog_group_commit.pl LOG:
procno 0 is follower and wait for group leader to update commit status
of xid 128745
2023-11-23 05:55:29.446 UTC [93380] 003_clog_group_commit.pl
STATEMENT: INSERT INTO test VALUES(5);
2023-11-23 05:55:29.451 UTC [93382] 003_clog_group_commit.pl LOG:
statement: SELECT test_injection_points_wake();
2023-11-23 05:55:29.460 UTC [93384] 003_clog_group_commit.pl LOG:
statement: SELECT test_injection_points_detach('ClogGroupCommit');

=============

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-test-group-update-poc-no-for-commit.patchapplication/octet-stream; name=0001-test-group-update-poc-no-for-commit.patchDownload
From b085a49c49726c44b3a6a5f9be09170207afbd72 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 22 Nov 2023 16:32:56 +0530
Subject: [PATCH] test-group-update-poc-no-for-commit

---
 src/backend/access/transam/clog.c             | 34 ++++---
 .../modules/test_injection_points/Makefile    |  2 +-
 .../t/003_clog_group_commit.pl                | 97 +++++++++++++++++++
 3 files changed, 121 insertions(+), 12 deletions(-)
 create mode 100644 src/test/modules/test_injection_points/t/003_clog_group_commit.pl

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 18ec2a47b5..fa985bcd6e 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -44,6 +44,7 @@
 #include "storage/proc.h"
 #include "storage/sync.h"
 #include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -313,6 +314,8 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 		 */
 		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
+			elog(LOG, "procno %d got the lock", MyProc->pgprocno);
+			INJECTION_POINT("ClogGroupCommit");
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
@@ -321,6 +324,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
 		{
+			elog(LOG, "procno %d completed group update", MyProc->pgprocno);
 			/* Group update mechanism has done the work. */
 			return;
 		}
@@ -472,7 +476,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
 										   &nextidx,
 										   (uint32) proc->pgprocno))
+		{
+			elog(LOG, "procno %d for xid %d added for group update", proc->pgprocno, xid);
 			break;
+		}
 	}
 
 	/*
@@ -485,6 +492,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	{
 		int			extraWaits = 0;
 
+		elog(LOG, "procno %d is follower and wait for group leader to update commit status of xid %d", proc->pgprocno, xid);
+
 		/* Sleep until the leader updates our XID status. */
 		pgstat_report_wait_start(WAIT_EVENT_XACT_GROUP_UPDATE);
 		for (;;)
@@ -502,20 +511,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		/* Fix semaphore count for any absorbed wakeups */
 		while (extraWaits-- > 0)
 			PGSemaphoreUnlock(proc->sem);
+		elog(LOG, "procno %d is follower and commit status of xid %d is updated by leader", proc->pgprocno, xid);
 		return true;
 	}
 
-	/*
-	 * We are leader so clear the list of processes waiting for group XID
-	 * status update, saving a pointer to the head of the list. Trying to pop
-	 * elements one at a time could lead to an ABA problem.
-	 */
-	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
-									 INVALID_PGPROCNO);
-
-	/* Remember head of list so we can perform wakeups after dropping lock. */
-	wakeidx = nextidx;
-
 	/*
 	 * Acquire the SLRU bank lock for the first page in the group.  And if
 	 * there are multiple pages in the group which falls under different banks
@@ -529,6 +528,18 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
 	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
 	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+	elog(LOG, "procno %d is group leader and got the lock", proc->pgprocno);
+
+	/*
+	 * We are leader so clear the list of processes waiting for group XID
+	 * status update, saving a pointer to the head of the list. Trying to pop
+	 * elements one at a time could lead to an ABA problem.
+	 */
+	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
+									 INVALID_PGPROCNO);
+
+	/* Remember head of list so we can perform wakeups after dropping lock. */
+	wakeidx = nextidx;
 
 	/* Walk the list and update the status of all XIDs. */
 	while (nextidx != INVALID_PGPROCNO)
@@ -567,6 +578,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 										   nextproc->clogGroupMemberXidStatus,
 										   nextproc->clogGroupMemberLsn,
 										   nextproc->clogGroupMemberPage);
+		elog(LOG, "group leader updated status of xid %d", nextproc->clogGroupMemberXid);
 
 		/* Move to next proc in list. */
 		nextidx = pg_atomic_read_u32(&nextproc->clogGroupNext);
diff --git a/src/test/modules/test_injection_points/Makefile b/src/test/modules/test_injection_points/Makefile
index 4696c1b013..8974182b56 100644
--- a/src/test/modules/test_injection_points/Makefile
+++ b/src/test/modules/test_injection_points/Makefile
@@ -8,7 +8,7 @@ PGFILEDESC = "test_injection_points - test injection points"
 
 EXTENSION = test_injection_points
 DATA = test_injection_points--1.0.sql
-REGRESS = test_injection_points
+#REGRESS = test_injection_points
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/test_injection_points/t/003_clog_group_commit.pl b/src/test/modules/test_injection_points/t/003_clog_group_commit.pl
new file mode 100644
index 0000000000..229c798144
--- /dev/null
+++ b/src/test/modules/test_injection_points/t/003_clog_group_commit.pl
@@ -0,0 +1,97 @@
+# Test consistent of initial snapshot data.
+
+# This requires a node with wal_level=logical combined with an injection
+# point that forces a failure when a snapshot is initially built with a
+# logical slot created.
+#
+# See bug https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init(allows_streaming => 'logical');
+$node->start;
+
+$node->safe_psql('postgres', 'CREATE EXTENSION test_injection_points;');
+$node->safe_psql('postgres', 'CREATE TABLE test(a int);');
+
+# Consume multiple xids so that next xids get generated in new banks
+$node->safe_psql(
+	'postgres', q{
+do $$
+begin
+  for i in 1..128001 loop
+    -- use an exception block so that each iteration eats an XID
+    begin
+      insert into test values (i);
+    exception
+      when division_by_zero then null;
+    end;
+  end loop;
+end$$;
+});
+
+my $result = $node->safe_psql('postgres',
+	"SELECT txid_current();");
+is($result, qq(128740),
+	'check column trigger applied even on update for other column');
+
+$node->safe_psql('postgres',
+  "SELECT test_injection_points_attach('ClogGroupCommit', 'wait');");
+
+
+# First session will get the slru lock and will wait on injection point
+my $session1 = $node->background_psql('postgres');
+
+$session1->query_until(
+	qr/start/, q(
+\echo start
+INSERT INTO test VALUES(1);
+));
+
+#create another 4 session which will not get the lock as first session is holding that lock
+#so these all will go for group update
+my $session2 = $node->background_psql('postgres');
+
+$session2->query_until(
+	qr/start/, q(
+\echo start
+INSERT INTO test VALUES(2);
+));
+
+my $session3 = $node->background_psql('postgres');
+
+$session3->query_until(
+	qr/start/, q(
+\echo start
+INSERT INTO test VALUES(3);
+));
+
+my $session4 = $node->background_psql('postgres');
+
+$session4->query_until(
+	qr/start/, q(
+\echo start
+INSERT INTO test VALUES(4);
+));
+
+my $session5 = $node->background_psql('postgres');
+
+$session5->query_until(
+	qr/start/, q(
+\echo start
+INSERT INTO test VALUES(5);
+));
+
+# Now wake up the first session and let next 4 session perform the group update
+$node->safe_psql('postgres',
+  "SELECT test_injection_points_wake();");
+$node->safe_psql('postgres',
+  "SELECT test_injection_points_detach('ClogGroupCommit');");
+
+done_testing();
-- 
2.39.2 (Apple Git-143)

#45Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#44)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Nov 23, 2023 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Note: With this testing, we have found a bug in the bank-wise
approach, basically we are clearing a procglobal->clogGroupFirst, even
before acquiring the bank lock that means in most of the cases there
will be a single process in each group as a group leader

I realized that the bug fix I have done is not proper, so will send
the updated patch set with the proper fix soon.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#46Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#45)
4 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Nov 24, 2023 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 23, 2023 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Note: With this testing, we have found a bug in the bank-wise
approach, basically we are clearing a procglobal->clogGroupFirst, even
before acquiring the bank lock that means in most of the cases there
will be a single process in each group as a group leader

I realized that the bug fix I have done is not proper, so will send
the updated patch set with the proper fix soon.

PFA, updated patch set fixes the bug found during the testing of the
group update using the injection point. Also attached a path to test
the injection point but for that, we need to apply the injection point
patches [1]/messages/by-id/ZWACtHPetBFIvP61@paquier.xyz

[1]: /messages/by-id/ZWACtHPetBFIvP61@paquier.xyz

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v8-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v8-0002-Divide-SLRU-buffers-into-banks.patchDownload
From d537021f65f8104f327ebdd2f1820abdd2e075af Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 10:24:41 +0530
Subject: [PATCH v8 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 45 ++++++++++++++++++++++----
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 13 +++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 138 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 8237b40aa6..44008222da 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1019,3 +1020,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9ba5ae6534..96810959ab 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1017,3 +1018,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 62709fcd07..77511c6342 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3419,3 +3420,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ed24e1185..b0d90a4bd2 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -134,7 +135,6 @@ typedef enum
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -258,7 +258,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -266,6 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -497,12 +501,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1029,9 +1039,15 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1066,7 +1082,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int			this_page_number;
@@ -1613,3 +1629,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0dd48f40f3..923e706535 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -373,3 +374,12 @@ SubTransPagePrecedes(int page1, int page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bdbbe5cc0..98449cbdde 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -149,6 +149,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2444,3 +2445,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 18ea18316d..e4903c67ec 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5011,3 +5012,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c1345dab98..8649b066a8 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index c0d37e3eb3..8f20b66776 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -139,6 +144,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -175,5 +186,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v8-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v8-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 2e260ccd85812e9eefd033079fcce4485f641606 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 25 Oct 2023 14:45:00 +0530
Subject: [PATCH v8 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 93735e3aea..a5b189ab73 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4a431d5876..8237b40aa6 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -663,23 +663,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b897fabc70..9ba5ae6534 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -493,11 +493,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 57ed34c0a8..62709fcd07 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,13 +1851,13 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 62bb610167..0dd48f40f3 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -184,14 +185,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 38ddae08b8..4bdbbe5cc0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -117,7 +117,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below.
@@ -235,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -521,7 +521,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -569,7 +569,7 @@ AsyncShmemInit(void)
 	 * Set up SLRU management of the pg_notify data.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index a88cf5f118..c68d668514 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a794546db3..18ea18316d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
@@ -1347,7 +1347,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index b764ef6998..c1345dab98 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e48c066a5b..364553a314 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 552cc19e68..c0d37e3eb3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 02da6ba7e1..b3e6815ee4 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v8-0003-Remove-the-centralized-control-lock-and-LRU-count.patchapplication/octet-stream; name=v8-0003-Remove-the-centralized-control-lock-and-LRU-count.patchDownload
From f1ae603e4032aec63cbccfeadbd731cf60a467e6 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Fri, 17 Nov 2023 14:42:25 +0530
Subject: [PATCH v8 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 123 ++++++++----
 src/backend/access/transam/commit_ts.c   |  43 ++--
 src/backend/access/transam/multixact.c   | 175 +++++++++++-----
 src/backend/access/transam/slru.c        | 244 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  33 +--
 src/include/access/slru.h                |  62 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  32 +--
 12 files changed, 602 insertions(+), 246 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 44008222da..9745a1f9f9 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -275,15 +275,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -301,17 +306,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -324,10 +329,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -346,7 +351,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -397,14 +403,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -418,6 +423,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -498,8 +505,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -516,6 +532,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -535,7 +582,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -564,10 +612,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -656,7 +705,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -690,8 +739,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -705,8 +754,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -715,7 +765,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -750,14 +800,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u32(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -768,8 +814,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int			pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -801,7 +848,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -833,6 +880,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -843,13 +891,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -987,16 +1036,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 96810959ab..9afa3beaa7 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -219,8 +219,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -230,13 +231,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -337,7 +338,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -527,9 +528,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
-				  SYNC_HANDLER_COMMIT_TS);
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU, SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
@@ -685,9 +685,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -714,12 +712,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -768,9 +767,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -802,6 +801,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -822,12 +822,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -981,16 +983,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1002,7 +1006,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u32(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 77511c6342..f204cb4db3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,14 +1910,14 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
@@ -1895,8 +1952,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1905,9 +1964,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1916,7 +1976,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1976,10 +2036,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1994,7 +2056,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2016,13 +2078,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u32(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2047,13 +2111,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2068,7 +2132,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2076,18 +2142,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2099,7 +2164,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2114,10 +2181,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2405,6 +2471,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2415,13 +2482,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2454,15 +2522,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int			pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2760,7 +2830,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3242,31 +3312,33 @@ multixact_redo(XLogReaderState *record)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int			pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
-
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3332,7 +3404,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u32(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index b0d90a4bd2..902f76def7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -72,6 +72,21 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -93,34 +108,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -147,6 +134,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 /*
  * Initialization of shared memory
@@ -156,6 +144,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -165,6 +155,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -181,16 +173,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -202,18 +197,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -234,6 +227,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -245,7 +242,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -254,6 +251,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -307,7 +313,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u32(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -346,12 +352,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -402,10 +409,14 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -448,9 +459,10 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -459,7 +471,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -503,9 +515,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -529,8 +542,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -552,6 +565,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int			pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -580,7 +594,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -595,7 +609,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1039,13 +1053,14 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int			best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1081,7 +1096,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1103,7 +1118,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number == pg_atomic_read_u32(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1177,6 +1192,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int			pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1187,10 +1203,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1204,7 +1233,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1244,6 +1273,7 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1254,25 +1284,38 @@ SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u32(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1302,10 +1345,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1346,15 +1391,29 @@ SlruDeleteSegment(SlruCtl ctl, int segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1388,7 +1447,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1630,6 +1689,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1646,3 +1736,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 923e706535..44c3969650 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -78,12 +78,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -101,7 +103,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -131,7 +133,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -194,8 +196,9 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU,
+				  SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -213,8 +216,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -223,7 +227,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -253,6 +257,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int			startPage;
 	int			endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -260,23 +266,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -310,6 +340,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int			pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -321,12 +352,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 98449cbdde..4b5bb0ed16 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -268,9 +268,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -571,7 +572,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE);
 
 	if (!found)
@@ -1403,7 +1404,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1413,9 +1414,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int			pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1439,6 +1438,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * wrapped around, but re-zeroing the page is harmless in that case.)
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1484,6 +1488,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1510,7 +1525,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1989,9 +2004,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -2011,7 +2026,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -2052,7 +2067,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2205,7 +2220,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e4903c67ec..452b918181 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,8 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -847,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int			firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 8f20b66776..faa47698c4 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -21,6 +21,7 @@
  * SLRU bank size for slotno hash banks
  */
 #define SLRU_BANK_SIZE		16
+#define	SLRU_MAX_BANKLOCKS	128
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -62,8 +63,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -76,8 +75,35 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int		   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Locks to protect the in memory buffer slot access in SLRU bank.  If the
+	 * number of banks are <= SLRU_MAX_BANKLOCKS then there will be one lock
+	 * per bank otherwise each lock will protect multiple banks depends upon
+	 * the number of banks.
+	 */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -89,23 +115,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int			latest_page_number;
+	pg_atomic_uint32 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -154,11 +169,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int pageno)
+{
+	int			banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler);
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 							  TransactionId xid);
@@ -187,4 +215,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index ae21444c47..9b48eb07c8 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int			pageno = PG_GETARG_INT32(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int			pageno = PG_GETARG_INT32(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -215,6 +214,7 @@ test_slru_shmem_startup(void)
 {
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -228,11 +228,13 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
+				  NUM_TEST_BUFFERS, 0, slru_dir_name, test_buffer_tranche_id,
 				  test_tranche_id, SYNC_HANDLER_NONE);
 }
 
-- 
2.39.2 (Apple Git-143)

test_group_commit.patchapplication/octet-stream; name=test_group_commit.patchDownload
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 9745a1f9f9..fe56d42a07 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -44,6 +44,7 @@
 #include "storage/proc.h"
 #include "storage/sync.h"
 #include "utils/guc_hooks.h"
+#include "utils/injection_point.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -313,6 +314,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 		 */
 		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
+			INJECTION_POINT("ClogGroupCommit");
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
@@ -472,7 +474,10 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
 										   &nextidx,
 										   (uint32) proc->pgprocno))
+		{
+			elog(LOG, "procno %d for xid %d added for group update", proc->pgprocno, xid);
 			break;
+		}
 	}
 
 	/*
@@ -485,6 +490,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	{
 		int			extraWaits = 0;
 
+		elog(LOG, "procno %d is follower and wait for group leader to update commit status of xid %d", proc->pgprocno, xid);
+
 		/* Sleep until the leader updates our XID status. */
 		pgstat_report_wait_start(WAIT_EVENT_XACT_GROUP_UPDATE);
 		for (;;)
@@ -502,6 +509,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		/* Fix semaphore count for any absorbed wakeups */
 		while (extraWaits-- > 0)
 			PGSemaphoreUnlock(proc->sem);
+		elog(LOG, "procno %d is follower and commit status of xid %d is updated by leader", proc->pgprocno, xid);
 		return true;
 	}
 
@@ -516,6 +524,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
 	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
 	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+	elog(LOG, "procno %d is group leader and got the lock", proc->pgprocno);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -576,6 +585,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 										   nextproc->clogGroupMemberXidStatus,
 										   nextproc->clogGroupMemberLsn,
 										   nextproc->clogGroupMemberPage);
+		elog(LOG, "group leader updated status of xid %d", nextproc->clogGroupMemberXid);
 
 		/* Move to next proc in list. */
 		nextidx = pg_atomic_read_u32(&nextproc->clogGroupNext);
diff --git a/src/test/modules/test_injection_points/Makefile b/src/test/modules/test_injection_points/Makefile
index 4696c1b013..8974182b56 100644
--- a/src/test/modules/test_injection_points/Makefile
+++ b/src/test/modules/test_injection_points/Makefile
@@ -8,7 +8,7 @@ PGFILEDESC = "test_injection_points - test injection points"
 
 EXTENSION = test_injection_points
 DATA = test_injection_points--1.0.sql
-REGRESS = test_injection_points
+#REGRESS = test_injection_points
 
 TAP_TESTS = 1
 
#47tender wang
tndrwang@gmail.com
In reply to: Dilip Kumar (#46)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

The v8-0001 patch failed to apply in my local repo as below:

git apply v8-0001-Make-all-SLRU-buffer-sizes-configurable.patch
error: patch failed: src/backend/access/transam/multixact.c:1851
error: src/backend/access/transam/multixact.c: patch does not apply
error: patch failed: src/backend/access/transam/subtrans.c:184
error: src/backend/access/transam/subtrans.c: patch does not apply
error: patch failed: src/backend/commands/async.c:117
error: src/backend/commands/async.c: patch does not apply
error: patch failed: src/backend/storage/lmgr/predicate.c:808
error: src/backend/storage/lmgr/predicate.c: patch does not apply
error: patch failed: src/include/commands/async.h:15
error: src/include/commands/async.h: patch does not apply

My local head commit is 15c9ac36299. Is there something I missed?

Dilip Kumar <dilipbalaut@gmail.com> 于2023年11月24日周五 17:08写道:

Show quoted text

On Fri, Nov 24, 2023 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com>
wrote:

On Thu, Nov 23, 2023 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

Note: With this testing, we have found a bug in the bank-wise
approach, basically we are clearing a procglobal->clogGroupFirst, even
before acquiring the bank lock that means in most of the cases there
will be a single process in each group as a group leader

I realized that the bug fix I have done is not proper, so will send
the updated patch set with the proper fix soon.

PFA, updated patch set fixes the bug found during the testing of the
group update using the injection point. Also attached a path to test
the injection point but for that, we need to apply the injection point
patches [1]

[1] /messages/by-id/ZWACtHPetBFIvP61@paquier.xyz

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#48Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: tender wang (#47)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2023-Nov-29, tender wang wrote:

The v8-0001 patch failed to apply in my local repo as below:

git apply v8-0001-Make-all-SLRU-buffer-sizes-configurable.patch
error: patch failed: src/backend/access/transam/multixact.c:1851
error: src/backend/access/transam/multixact.c: patch does not apply
error: patch failed: src/backend/access/transam/subtrans.c:184
error: src/backend/access/transam/subtrans.c: patch does not apply
error: patch failed: src/backend/commands/async.c:117
error: src/backend/commands/async.c: patch does not apply
error: patch failed: src/backend/storage/lmgr/predicate.c:808
error: src/backend/storage/lmgr/predicate.c: patch does not apply
error: patch failed: src/include/commands/async.h:15
error: src/include/commands/async.h: patch does not apply

Yeah, this patch series conflicts with today's commit 4ed8f0913bfd.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

#49Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#48)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Nov 29, 2023 at 3:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Nov-29, tender wang wrote:

The v8-0001 patch failed to apply in my local repo as below:

git apply v8-0001-Make-all-SLRU-buffer-sizes-configurable.patch
error: patch failed: src/backend/access/transam/multixact.c:1851
error: src/backend/access/transam/multixact.c: patch does not apply
error: patch failed: src/backend/access/transam/subtrans.c:184
error: src/backend/access/transam/subtrans.c: patch does not apply
error: patch failed: src/backend/commands/async.c:117
error: src/backend/commands/async.c: patch does not apply
error: patch failed: src/backend/storage/lmgr/predicate.c:808
error: src/backend/storage/lmgr/predicate.c: patch does not apply
error: patch failed: src/include/commands/async.h:15
error: src/include/commands/async.h: patch does not apply

Yeah, this patch series conflicts with today's commit 4ed8f0913bfd.

I will send a rebased version by tomorrow.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#50Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#49)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 29, 2023 at 3:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Nov-29, tender wang wrote:

The v8-0001 patch failed to apply in my local repo as below:

git apply v8-0001-Make-all-SLRU-buffer-sizes-configurable.patch
error: patch failed: src/backend/access/transam/multixact.c:1851
error: src/backend/access/transam/multixact.c: patch does not apply
error: patch failed: src/backend/access/transam/subtrans.c:184
error: src/backend/access/transam/subtrans.c: patch does not apply
error: patch failed: src/backend/commands/async.c:117
error: src/backend/commands/async.c: patch does not apply
error: patch failed: src/backend/storage/lmgr/predicate.c:808
error: src/backend/storage/lmgr/predicate.c: patch does not apply
error: patch failed: src/include/commands/async.h:15
error: src/include/commands/async.h: patch does not apply

Yeah, this patch series conflicts with today's commit 4ed8f0913bfd.

I will send a rebased version by tomorrow.

PFA, a rebased version of the patch, I have avoided attaching because
a) that patch is POC to show the coverage and it has a dependency on
the other thread b) the old patch still applies so it doesn't need
rebase.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v9-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v9-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 2e14ffc4f4934e2dcf2dd1613ab612bf3b8d4aed Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v9 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 94d1eb2b81..1589f2e189 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index cc60eab1e2..58d0ab4c81 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 7c642f7b59..d5f58ce201 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index db3423f12e..89e6bafb27 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 64673eaef6..bf39a82f59 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 2651d8904b..f52976d1b8 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c361bb2079..452369d56d 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ff8df7c0bc..ad2f951c7a 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6474e35ec0..96511bd204 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..5dd49d7294 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 091e2202c9..be047e3032 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index a44472b352..351382d3e0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v9-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v9-0002-Divide-SLRU-buffers-into-banks.patchDownload
From d805eaf3931092aa0e9a06b26e92a507d91fa5f3 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v9 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 13 +++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 138 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 58d0ab4c81..2c82463018 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index d5f58ce201..549c24844c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 89e6bafb27..65739b2f9c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ac49c99c8b..6a98a8f7ae 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static int	inline
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index bf39a82f59..53c130a457 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f52976d1b8..f116d1bad5 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2382,3 +2383,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ad2f951c7a..019e58a62b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 96511bd204..3468e7fb22 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index be047e3032..ebaffd7d2c 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +152,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +195,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v9-0003-Remove-the-centralized-control-lock-and-LRU-count.patchapplication/octet-stream; name=v9-0003-Remove-the-centralized-control-lock-and-LRU-count.patchDownload
From 6250e4cd15a575c76a4e27b00269539796866332 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 14:07:32 +0530
Subject: [PATCH v9 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 123 ++++++++----
 src/backend/access/transam/commit_ts.c   |  42 ++--
 src/backend/access/transam/multixact.c   | 173 +++++++++++-----
 src/backend/access/transam/slru.c        | 245 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  34 ++--
 src/include/access/slru.h                |  62 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  35 ++--
 12 files changed, 604 insertions(+), 246 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2c82463018..bf4db8d5ba 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -508,8 +515,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +542,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +592,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,10 +622,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -666,7 +715,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +749,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +764,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +775,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +810,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +824,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +858,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +890,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +901,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1046,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 549c24844c..326e22fd62 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 65739b2f9c..fd4c7baf6e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 6a98a8f7ae..f6f8a70568 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 53c130a457..05a440fccf 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = ShmemVariableCache->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f116d1bad5..a16afb25d3 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 019e58a62b..a5768d54f2 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index ebaffd7d2c..b48167913a 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -21,6 +21,7 @@
  * SLRU bank size for slotno hash banks
  */
 #define SLRU_BANK_SIZE		16
+#define	SLRU_MAX_BANKLOCKS	128
 
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
@@ -62,8 +63,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -76,8 +75,35 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/*
+	 * Locks to protect the in memory buffer slot access in SLRU bank.  If the
+	 * number of banks are <= SLRU_MAX_BANKLOCKS then there will be one lock
+	 * per bank otherwise each lock will protect multiple banks depends upon
+	 * the number of banks.
+	 */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -89,23 +115,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -162,11 +177,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquire in order to access the slru buffer slots in
+ * the respective bank.  For more details refer comments in SlruSharedData.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -196,4 +224,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index d0fb9444e8..6b084f8dc0 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

#51Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#50)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Nov 30, 2023 at 3:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Here is the updated patch based on some comments by tender wang (those
comments were sent only to me)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v10-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v10-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 3d8b1dabbea1a00b2259ba0f6c7192ae23a516d0 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v10 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 16 +++++++++-
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 141 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 6e6b73a877..fc70b91bc9 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index a323fab4ff..10e378f846 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 89e6bafb27..65739b2f9c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 7a371d9034..ce589493e4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 2259f882ef..3f2444a37e 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b80f75193..87082b8f86 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2378,3 +2379,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 02eb2c9822..9175aaabd1 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 96511bd204..3468e7fb22 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2296,7 +2296,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2307,7 +2307,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2318,7 +2318,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2328,7 +2328,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2339,7 +2339,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2350,7 +2350,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index be047e3032..d76df1d2cd 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,14 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +155,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +198,5 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v10-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchapplication/octet-stream; name=v10-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchDownload
From eac07fc14e150692ceff85594eb39c031da26aab Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Mon, 11 Dec 2023 10:13:28 +0530
Subject: [PATCH v10 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 123 ++++++++----
 src/backend/access/transam/commit_ts.c   |  42 ++--
 src/backend/access/transam/multixact.c   | 173 +++++++++++-----
 src/backend/access/transam/slru.c        | 245 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  34 ++--
 src/include/access/slru.h                |  64 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  35 ++--
 12 files changed, 606 insertions(+), 246 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index fc70b91bc9..43ccf70155 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -508,8 +515,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +542,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +592,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,10 +622,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+						  int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -666,7 +715,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +749,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +764,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +775,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +810,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +824,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +858,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +890,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +901,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1046,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 10e378f846..ed65f2e910 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 65739b2f9c..fd4c7baf6e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ce589493e4..01ffdf3cb7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 3f2444a37e..90544fb007 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 87082b8f86..33acb60c9d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 9175aaabd1..79c419c698 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index d76df1d2cd..362aecc8fe 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -25,6 +25,14 @@
  */
 #define SLRU_BANK_SIZE		16
 
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -65,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -79,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -92,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -165,11 +182,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -199,4 +229,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index d0fb9444e8..6b084f8dc0 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

v10-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v10-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From f4b83a727ea4da0dcb835c41e4385cbcc6108902 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v10 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 94d1eb2b81..1589f2e189 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7dca1df61b..6e6b73a877 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index e6fd9b3349..a323fab4ff 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index db3423f12e..89e6bafb27 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 1b3b3ad720..2259f882ef 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 264f25a8f9..8b80f75193 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c361bb2079..452369d56d 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1129b8e4f2..02eb2c9822 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6474e35ec0..96511bd204 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2287,6 +2288,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..5dd49d7294 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 091e2202c9..be047e3032 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index a44472b352..351382d3e0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..e2473f41de 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

#52Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#51)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

[Added Andrey again in CC, because as I understand they are using this
code or something like it in production. Please don't randomly remove
people from CC lists.]

I've been looking at this some more, and I'm not confident in that the
group clog update stuff is correct. I think the injection points test
case was good enough to discover a problem, but it's hard to get peace
of mind that there aren't other, more subtle problems.

The problem I see is that the group update mechanism is designed around
contention of the global xact-SLRU control lock; it uses atomics to
coordinate a single queue when the lock is contended. So if we split up
the global SLRU control lock using banks, then multiple processes using
different bank locks might not contend. OK, this is fine, but what
happens if two separate groups of processes encounter contention on two
different bank locks? I think they will both try to update the same
queue, and coordinate access to that *using different bank locks*. I
don't see how can this work correctly.

I suspect the first part of that algorithm, where atomics are used to
create the list without a lock, might work fine. But will each "leader"
process, each of which is potentially using a different bank lock,
coordinate correctly? Maybe this works correctly because only one
process will find the queue head not empty? If this is what happens,
then there needs to be comments about it. Without any explanation,
this seems broken and potentially dangerous, as some transaction commit
bits might become lost given high enough concurrency and bad luck.

Maybe this can be made to work by having one more lwlock that we use
solely to coordinate this task. Though we would have to demonstrate
that coordinating this task with a different lock works correctly in
conjunction with the per-bank lwlock usage in the regular slru.c paths.

Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).

#53Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#52)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Dec 12, 2023 at 6:58 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

[Added Andrey again in CC, because as I understand they are using this
code or something like it in production. Please don't randomly remove
people from CC lists.]

Oh, glad to know that. Yeah, I generally do not remove but I have
noticed that in the mail chain, some of the reviewers just replied to
me and the hackers' list, and from that point onwards I lost track of
the CC list.

I've been looking at this some more, and I'm not confident in that the
group clog update stuff is correct. I think the injection points test
case was good enough to discover a problem, but it's hard to get peace
of mind that there aren't other, more subtle problems.

Yeah, I agree.

The problem I see is that the group update mechanism is designed around
contention of the global xact-SLRU control lock; it uses atomics to
coordinate a single queue when the lock is contended. So if we split up
the global SLRU control lock using banks, then multiple processes using
different bank locks might not contend. OK, this is fine, but what
happens if two separate groups of processes encounter contention on two
different bank locks? I think they will both try to update the same
queue, and coordinate access to that *using different bank locks*. I
don't see how can this work correctly.

Let's back up a bit and start from the current design with the
centralized lock. With that, if one process is holding the lock the
other processes will try to perform the group update, and if there is
already a group that still hasn't got the lock but trying to update
the different CLOG page then what this process wants to update then it
will not add itself for the group update instead it will fallback to
the normal lock wait. Now, in another situation, it may so happen
that the group leader of the other group already got the control lock
and in such case, it would have cleared the
'procglobal->clogGroupFirst' that means now we will start forming a
different group. So logically if we talk only about the optimization
part then the thing is that it is assumed that at a time when we are
trying to commit a log of concurrent xid then those xids are mostly of
the same range and will fall in the same SLRU page and group update
will help them. But if we are getting some out-of-range xid of some
long-running transaction they might not even go for group update as
the page number will be different. Although the situation might be
better here with a bank-wise lock because there if those xids are
falling in altogether a different bank it might not even contend.

Now, let's talk about the correctness, I think even though we are
getting processes that might be contending on different bank-lock,
still we are ensuring that in a group all the processes are trying to
update the same SLRU page (i.e. same bank also, we will talk about the
exception later[1]I think we already know about the exception case and I have explained in the comments as well that in some cases we might add different clog page update requests in the same group, and for handling that exceptional case we are checking the respective bank lock for each page and if that exception occurred we will release the old bank lock and acquire a new lock. This case might not be performant because now it is possible that after getting the lock leader might need to wait again on another bank lock, but this is an extremely exceptional case so should not be worried about performance and I do not see any correctness issue here as well.). One of the processes is becoming a leader and as
soon as the leader gets the lock it detaches the queue from the
'procglobal->clogGroupFirst' by setting it as INVALID_PGPROCNO so that
other group update requesters now can form another parallel group.
But here I do not see a problem with correctness.

I agree someone might say that since now there is a possibility that
different groups might get formed for different bank locks we do not
get other groups to get formed until we get the lock for our bank as
we do not clear 'procglobal->clogGroupFirst' before we get the lock.
Other requesters might want to update the page in different banks so
why block them? But the thing is the group update design is optimized
for the cases where all requesters are trying to update the status of
xids generated near the same range.

I suspect the first part of that algorithm, where atomics are used to
create the list without a lock, might work fine. But will each "leader"
process, each of which is potentially using a different bank lock,
coordinate correctly? Maybe this works correctly because only one
process will find the queue head not empty? If this is what happens,
then there needs to be comments about it.

Yes, you got it right, I will try to comment on it better.

Without any explanation,

this seems broken and potentially dangerous, as some transaction commit
bits might become lost given high enough concurrency and bad luck.

Maybe this can be made to work by having one more lwlock that we use
solely to coordinate this task.

Do you mean to say a different lock for adding/removing in the list
instead of atomic operation? I think then we will lose the benefit we
got in the group update by having contention on another lock.

[1]: I think we already know about the exception case and I have explained in the comments as well that in some cases we might add different clog page update requests in the same group, and for handling that exceptional case we are checking the respective bank lock for each page and if that exception occurred we will release the old bank lock and acquire a new lock. This case might not be performant because now it is possible that after getting the lock leader might need to wait again on another bank lock, but this is an extremely exceptional case so should not be worried about performance and I do not see any correctness issue here as well.
explained in the comments as well that in some cases we might add
different clog page update requests in the same group, and for
handling that exceptional case we are checking the respective bank
lock for each page and if that exception occurred we will release the
old bank lock and acquire a new lock. This case might not be
performant because now it is possible that after getting the lock
leader might need to wait again on another bank lock, but this is an
extremely exceptional case so should not be worried about performance
and I do not see any correctness issue here as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#54Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#52)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 12 Dec 2023, at 18:28, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

We are using only first two steps of the patchset, these steps do not touch locking stuff.

We’ve got some confidence after Yura Sokolov’s benchmarks [0]/messages/by-id/e46cdea96979545b2d8a13b451d8b1ce61dc7238.camel@postgrespro.ru. Thanks!

Best regards, Andrey Borodin.

[0]: /messages/by-id/e46cdea96979545b2d8a13b451d8b1ce61dc7238.camel@postgrespro.ru

#55Amul Sul
sulamul@gmail.com
In reply to: Dilip Kumar (#51)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Dec 11, 2023 at 10:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 30, 2023 at 3:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar <dilipbalaut@gmail.com>

wrote:

Here is the updated patch based on some comments by tender wang (those
comments were sent only to me)

few nitpicks:

+
+   /*
+    * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+    * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+    */
+   bits16      bank_mask;
 } SlruCtlData;

...
...

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a
number
of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask) which
is a
bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using & over % ?

Also, a few places in 0002 and 0003 patch, need the bank number, it is
better
to have a macro for that.
---

 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64
segpage,
                                   void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
 #endif                         /* SLRU_H */

Add an empty line after the declaration, in 0002 patch.
---

-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr
lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr
lsn,
+                         int slotno)

Unrelated change for 0003 patch.
---

Regards,
Amul

#56Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amul Sul (#55)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Dec 14, 2023 at 8:43 AM Amul Sul <sulamul@gmail.com> wrote:

On Mon, Dec 11, 2023 at 10:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 30, 2023 at 3:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Here is the updated patch based on some comments by tender wang (those
comments were sent only to me)

few nitpicks:

+
+   /*
+    * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+    * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+    */
+   bits16      bank_mask;
} SlruCtlData;

...
...

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a number
of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask) which is a
bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using & over % ?

I am not sure either but since this change in 0002 is by Andrey, I
will let him comment on this before we change or take any decision.

Also, a few places in 0002 and 0003 patch, need the bank number, it is better
to have a macro for that.
---

extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
#endif                         /* SLRU_H */

Add an empty line after the declaration, in 0002 patch.
---

-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+                         int slotno)

Unrelated change for 0003 patch.

Fixed

Thanks for your review, PFA updated version.

I have added @Amit Kapila to the list to view his opinion about
whether anything can break in the clog group update with our changes
of bank-wise SLRU lock.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v11-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v11-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 5209ccd60c1fe3c2118bbcb2ad2550c94676b1c2 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v11 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 15 +++++++++
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 141 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 6e6b73a877..fc70b91bc9 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index a323fab4ff..10e378f846 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 89e6bafb27..65739b2f9c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 7a371d9034..ce589493e4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 2259f882ef..3f2444a37e 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b80f75193..87082b8f86 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2378,3 +2379,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 02eb2c9822..9175aaabd1 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 75e5725d9c..ef4ed8e8c4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2295,7 +2295,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2306,7 +2306,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2317,7 +2317,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2327,7 +2327,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2338,7 +2338,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2349,7 +2349,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2360,7 +2360,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index be047e3032..bd682d6368 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,14 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +155,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +198,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v11-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v11-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 0d5ff13df547d0b84735de339ab843ee738c0f5d Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v11 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 44cada2b40..db776bd3c9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7dca1df61b..6e6b73a877 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index e6fd9b3349..a323fab4ff 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index db3423f12e..89e6bafb27 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 1b3b3ad720..2259f882ef 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 264f25a8f9..8b80f75193 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c361bb2079..452369d56d 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1129b8e4f2..02eb2c9822 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f7c9882f7c..75e5725d9c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2286,6 +2287,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..5dd49d7294 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 091e2202c9..be047e3032 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index a44472b352..351382d3e0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1043a4d782..bec72875c1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v11-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchapplication/octet-stream; name=v11-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchDownload
From 5f349456825006eeb72786cd0019e38034367d0f Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 14 Dec 2023 13:43:38 +0530
Subject: [PATCH v11 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 151 ++++++++++----
 src/backend/access/transam/commit_ts.c   |  42 ++--
 src/backend/access/transam/multixact.c   | 173 +++++++++++-----
 src/backend/access/transam/slru.c        | 245 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  34 ++--
 src/include/access/slru.h                |  64 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  35 ++--
 12 files changed, 635 insertions(+), 245 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index fc70b91bc9..513efdedbc 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -442,6 +449,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * Despite splitting the centralized SLRU lock into bank-wise locks, the
+	 * group update logic should still work as explained below.
+	 *
+	 * Bank-wise locks can have requesters who don't wait for the same locks,
+	 * but it is fine because before adding to the group, we ensure that all
+	 * requesters have requests to update the same clog page, and if they are
+	 * trying to update the same clog page, it means they are contending on the
+	 * same bank lock.  However, with existing code unless the leader of the
+	 * current group gets the centralized SLRU lock, we do not allow other
+	 * requesters that want to update the different page to form another group
+	 * because 1) that is not the most common case 2) If the leader of the
+	 * first group is still waiting on the lock then the same problem will come
+	 * for another group as well and if we try to form multiple groups that
+	 * will make the design of group update more complex.  Now with a bank-wise
+	 * lock, the 2nd point is not completely true because the other requesters
+	 * for different pages might fall all together in a different bank and they
+	 * might get the lock now.  But other points still hold that this is not
+	 * the most common case and also it would make design more complex.
+	 *
+	 * So more or less the design with the bank-wise lock will also work in the
+	 * same way it is working with the centralized lock, that at a time only
+	 * one group will be formed, and only after the leader gets the lock it
+	 * will reset the 'procglobal->clogGroupFirst' variable and detach the
+	 * group and after that other requesters can form another group and the
+	 * mechanisms are not different from existing one.  The only difference is
+	 * that after the first group is detached from the procglobal, the other
+	 * group is formed the leader of the other group might also get the lock
+	 * and start performing the group update concurrently as those two groups
+	 * might be working on completely different banks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -508,8 +546,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +573,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +623,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,7 +653,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -666,7 +745,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +779,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +794,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +805,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +840,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +854,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +888,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +920,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +931,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1076,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 10e378f846..ed65f2e910 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 65739b2f9c..fd4c7baf6e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ce589493e4..01ffdf3cb7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 3f2444a37e..90544fb007 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 87082b8f86..33acb60c9d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 9175aaabd1..79c419c698 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index bd682d6368..5779a07a95 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -25,6 +25,14 @@
  */
 #define SLRU_BANK_SIZE		16
 
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -65,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -79,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -92,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -165,11 +182,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -199,5 +229,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index d0fb9444e8..6b084f8dc0 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

#57Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Amul Sul (#55)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 14 Dec 2023, at 08:12, Amul Sul <sulamul@gmail.com> wrote:

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a number
of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask) which is a
bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using & over % ?

The instruction AND is ~20 times faster than IDIV [0]https://www.agner.org/optimize/instruction_tables.pdf. This is relatively hot function, worth sacrificing some readability to save ~ten nanoseconds on each check of a status of a transaction.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

#58tender wang
tndrwang@gmail.com
In reply to: Andrey M. Borodin (#57)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2023年12月14日周四 17:02写道:

On 14 Dec 2023, at 08:12, Amul Sul <sulamul@gmail.com> wrote:

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a

number

of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask)

which is a

bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using & over % ?

use Compiler Explorer[1]https://godbolt.org/ tool, '%' has more Assembly instructions than '&'
.
int GetBankno1(int pageno) {
return pageno & 127;
}

int GetBankno2(int pageno) {
return pageno % 127;
}
under clang 13.0
GetBankno1: # @GetBankno1
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
and eax, 127
pop rbp
ret
GetBankno2: # @GetBankno2
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
mov ecx, 127
cdq
idiv ecx
mov eax, edx
pop rbp
ret
under gcc 13.2
GetBankno1:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
and eax, 127
pop rbp
ret
GetBankno2:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
movsx rdx, eax
imul rdx, rdx, -2130574327
shr rdx, 32
add edx, eax
mov ecx, edx
sar ecx, 6
cdq
sub ecx, edx
mov edx, ecx
sal edx, 7
sub edx, ecx
sub eax, edx
mov ecx, eax
mov eax, ecx
pop rbp
ret

[1]: https://godbolt.org/

The instruction AND is ~20 times faster than IDIV [0]. This is relatively

hot function, worth sacrificing some readability to save ~ten nanoseconds
on each check of a status of a transaction.

Now that AND is more faster, Can we replace the '% SLRU_MAX_BANKLOCKS'
operation in SimpleLruGetBankLock() with '& 127' :
SimpleLruGetBankLock()
{
int banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;

use '&'
return &(ctl->shared->bank_locks[banklockno].lock);
}
Thoughts?

Show quoted text

[0] https://www.agner.org/optimize/instruction_tables.pdf

#59Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: tender wang (#58)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 14 Dec 2023, at 14:28, tender wang <tndrwang@gmail.com> wrote:

Now that AND is more faster, Can we replace the '% SLRU_MAX_BANKLOCKS' operation in SimpleLruGetBankLock() with '& 127'

unsigned int GetBankno1(unsigned int pageno) {
return pageno & 127;
}

unsigned int GetBankno2(unsigned int pageno) {
return pageno % 128;
}

Generates with -O2

GetBankno1(unsigned int):
mov eax, edi
and eax, 127
ret
GetBankno2(unsigned int):
mov eax, edi
and eax, 127
ret

Compiler is smart enough with constants.

Best regards, Andrey Borodin.

#60Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#54)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Dec 13, 2023 at 5:49 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 12 Dec 2023, at 18:28, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

We are using only first two steps of the patchset, these steps do not touch locking stuff.

We’ve got some confidence after Yura Sokolov’s benchmarks [0]. Thanks!

I have run this test [1]/messages/by-id/e46cdea96979545b2d8a13b451d8b1ce61dc7238.camel@postgrespro.ru, instead of comparing against the master I
have compared the effect of (patch-1 = (0001+0002)slur buffer bank) vs
(patch-2 = (0001+0002+0003) slur buffer bank + bank-wise lock), and
here is the result of the benchmark-1 and benchmark-2. I have noticed
a very good improvement with the addition of patch 0003.

Machine information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127

configurations:

max_wal_size=20GB
shared_buffers=20GB
checkpoint_timeout=40min
max_connections=700
maintenance_work_mem=1GB

subtrans_buffers=$variable
multixact_offsets_buffers=$variable
multixact_members_buffers=$variable

benchmark-1
version | subtrans | multixact | tps
| buffers | offs/memb | func+ballast
-----------+--------------+--------------+------
patch-1 | 64 | 64/128 | 87 + 58
patch-2 | 64 | 64/128 | 128 +83
patch-1 | 1024 | 512/1024 | 96 + 64
patch-2 | 1024 | 512/1024 | 163+108

benchmark-2

version | subtrans | multixact | tps
| buffers | offs/memb | func
-----------+--------------+--------------+------
patch-1 | 64 | 64/128 | 10
patch-2 | 64 | 64/128 | 12
patch-1 | 1024 | 512/1024 | 44
patch-2 | 1024 | 512/1024 | 72

[1]: /messages/by-id/e46cdea96979545b2d8a13b451d8b1ce61dc7238.camel@postgrespro.ru

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#61Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#60)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 14 Dec 2023, at 16:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have noticed
a very good improvement with the addition of patch 0003.

Indeed, a very impressive results! It’s almost x2 of performance on high contention scenario, on top of previous improvements.

Best regards, Andrey Borodin.

#62tender wang
tndrwang@gmail.com
In reply to: Andrey M. Borodin (#59)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2023年12月14日周四 17:35写道:

On 14 Dec 2023, at 14:28, tender wang <tndrwang@gmail.com> wrote:

Now that AND is more faster, Can we replace the '%

SLRU_MAX_BANKLOCKS' operation in SimpleLruGetBankLock() with '& 127'

unsigned int GetBankno1(unsigned int pageno) {
return pageno & 127;
}

unsigned int GetBankno2(unsigned int pageno) {
return pageno % 128;
}

Generates with -O2

GetBankno1(unsigned int):
mov eax, edi
and eax, 127
ret
GetBankno2(unsigned int):
mov eax, edi
and eax, 127
ret

Compiler is smart enough with constants.

Yeah, that's true.

int GetBankno(long pageno) {
unsigned short bank_mask = 128;
int bankno = (pageno & bank_mask) % 128;
return bankno;
}
enable -O2, only one instruction:
xor eax, eax

But if we all use '%', thing changs as below:
int GetBankno(long pageno) {
unsigned short bank_mask = 128;
int bankno = (pageno % bank_mask) % 128;
return bankno;
}
mov rdx, rdi
sar rdx, 63
shr rdx, 57
lea rax, [rdi+rdx]
and eax, 127
sub eax, edx

Show quoted text

Best regards, Andrey Borodin.

#63Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: tender wang (#62)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 14 Dec 2023, at 16:32, tender wang <tndrwang@gmail.com> wrote:

enable -O2, only one instruction:
xor eax, eax

This is not fast code. This is how friendly C compiler suggests you that mask must be 127, not 128.

Best regards, Andrey Borodin.

#64Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#60)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Dec 14, 2023 at 4:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Dec 13, 2023 at 5:49 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 12 Dec 2023, at 18:28, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Andrey, do you have any stress tests or anything else that you used to
gain confidence in this code?

I have done some more testing for the clog group update as the
attached test file executes two concurrent scripts executed with
pgbench, the first script is the slow script which will run 10-second
long transactions and the second script is a very fast transaction
with ~10000 transactions per second. Along with that, I have also
changed the bank size such that each bank will contain just 1 page
i.e. 32k transactions per bank. I have done this way so that we do
not need to keep long-running transactions running for very long in
order to get the transactions from different banks committed during
the same time. With this test, I have got that behavior and the below
logs shows that multiple transaction range which is in different
slru-bank (considering 32k transactions per bank) are doing group
update at the same time. e.g. in the below logs, we can see xid range
around 70600, 70548, and 70558, and xid range around 755, and 752 are
getting group updates by different leaders but near the same time.

It is running fine when running for a long duration, but I am not sure
how to validate the sanity of this kind of test.

2023-12-14 14:43:31.813 GMT [3306] LOG: group leader procno 606
updated status of procno 606 xid 70600
2023-12-14 14:43:31.816 GMT [3326] LOG: procno 586 for xid 70548
added for group update
2023-12-14 14:43:31.816 GMT [3326] LOG: procno 586 is group leader
and got the lock
2023-12-14 14:43:31.816 GMT [3326] LOG: group leader procno 586
updated status of procno 586 xid 70548
2023-12-14 14:43:31.818 GMT [3327] LOG: procno 585 for xid 70558
added for group update
2023-12-14 14:43:31.818 GMT [3327] LOG: procno 585 is group leader
and got the lock
2023-12-14 14:43:31.818 GMT [3327] LOG: group leader procno 585
updated status of procno 585 xid 70558
2023-12-14 14:43:31.829 GMT [3155] LOG: procno 687 for xid 752 added
for group update
2023-12-14 14:43:31.829 GMT [3207] LOG: procno 669 for xid 755 added
for group update
2023-12-14 14:43:31.829 GMT [3155] LOG: procno 687 is group leader
and got the lock
2023-12-14 14:43:31.829 GMT [3155] LOG: group leader procno 687
updated status of procno 669 xid 755
2023-12-14 14:43:31.829 GMT [3155] LOG: group leader procno 687
updated status of procno 687 xid 752

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

test_clog_group_commit.shtext/x-sh; charset=US-ASCII; name=test_clog_group_commit.shDownload
#65Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#56)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Dec 14, 2023 at 1:53 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Dec 14, 2023 at 8:43 AM Amul Sul <sulamul@gmail.com> wrote:

On Mon, Dec 11, 2023 at 10:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Nov 30, 2023 at 3:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 29, 2023 at 4:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Here is the updated patch based on some comments by tender wang (those
comments were sent only to me)

few nitpicks:

+
+   /*
+    * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+    * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+    */
+   bits16      bank_mask;
} SlruCtlData;

...
...

+ int bankno = pageno & ctl->bank_mask;

I am a bit uncomfortable seeing it as a mask, why can't it be simply a number
of banks (num_banks) and get the bank number through modulus op (pageno %
num_banks) instead of bitwise & operation (pageno & ctl->bank_mask) which is a
bit difficult to read compared to modulus op which is quite simple,
straightforward and much common practice in hashing.

Are there any advantages of using & over % ?

I am not sure either but since this change in 0002 is by Andrey, I
will let him comment on this before we change or take any decision.

Also, a few places in 0002 and 0003 patch, need the bank number, it is better
to have a macro for that.
---

extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
void *data);
-
+extern bool check_slru_buffers(const char *name, int *newval);
#endif                         /* SLRU_H */

Add an empty line after the declaration, in 0002 patch.
---

-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn,
+                         int slotno)

Unrelated change for 0003 patch.

Fixed

Thanks for your review, PFA updated version.

I have added @Amit Kapila to the list to view his opinion about
whether anything can break in the clog group update with our changes
of bank-wise SLRU lock.

Updated the comments about group commit safety based on the off-list
suggestion by Alvaro.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v12-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v12-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 0d5ff13df547d0b84735de339ab843ee738c0f5d Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v12 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 44cada2b40..db776bd3c9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7dca1df61b..6e6b73a877 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index e6fd9b3349..a323fab4ff 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index db3423f12e..89e6bafb27 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 1b3b3ad720..2259f882ef 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 264f25a8f9..8b80f75193 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c361bb2079..452369d56d 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1129b8e4f2..02eb2c9822 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 60bc1217fb..96d480325b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f7c9882f7c..75e5725d9c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2286,6 +2287,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..5dd49d7294 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 0be1355892..18d7ba4ca9 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 091e2202c9..be047e3032 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 46a473c77f..147dc4acc3 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index a44472b352..351382d3e0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1043a4d782..bec72875c1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index cd48afa17b..7b68c8f1c7 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 3d74483f44..7b95acf36e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v12-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v12-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 5209ccd60c1fe3c2118bbcb2ad2550c94676b1c2 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v12 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 15 +++++++++
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 141 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 6e6b73a877..fc70b91bc9 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index a323fab4ff..10e378f846 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 89e6bafb27..65739b2f9c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 7a371d9034..ce589493e4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 2259f882ef..3f2444a37e 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b80f75193..87082b8f86 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2378,3 +2379,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 02eb2c9822..9175aaabd1 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 75e5725d9c..ef4ed8e8c4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2295,7 +2295,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2306,7 +2306,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2317,7 +2317,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2327,7 +2327,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2338,7 +2338,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2349,7 +2349,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2360,7 +2360,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index be047e3032..bd682d6368 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,14 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +155,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +198,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 7b95acf36e..0edd59f867 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v12-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchapplication/octet-stream; name=v12-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchDownload
From b7572b3068bddbe3d8a678f4ce831fd460635593 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 14 Dec 2023 13:43:38 +0530
Subject: [PATCH v12 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c        | 155 ++++++++++----
 src/backend/access/transam/commit_ts.c   |  42 ++--
 src/backend/access/transam/multixact.c   | 173 +++++++++++-----
 src/backend/access/transam/slru.c        | 245 +++++++++++++++++------
 src/backend/access/transam/subtrans.c    |  58 ++++--
 src/backend/commands/async.c             |  43 ++--
 src/backend/storage/lmgr/lwlock.c        |  14 ++
 src/backend/storage/lmgr/lwlocknames.txt |  14 +-
 src/backend/storage/lmgr/predicate.c     |  34 ++--
 src/include/access/slru.h                |  64 ++++--
 src/include/storage/lwlock.h             |   7 +
 src/test/modules/test_slru/test_slru.c   |  35 ++--
 12 files changed, 639 insertions(+), 245 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index fc70b91bc9..1a4727714c 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -442,6 +449,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank locks.
+	 * But in the group, we try to only add the requesters who want to update
+	 * the same page i.e. they would be requesting for the same SLRU-bank lock
+	 * as well.  The main reason for now allowing requesters of different pages
+	 * together is 1) Once the leader acquires the lock they don't need to
+	 * fetch multiple pages and do multiple I/O under the same lock 2) The
+	 * leader need not switch the SLRU-bank lock if the different pages are
+	 * from different SLRU banks 3) And the most important reason is that most
+	 * of the time the contention will occur in high concurrent OLTP workload
+	 * is going on and at that time most of the transactions would be generated
+	 * during the same time and most of them would fall in same clog page as
+	 * each page can hold status of 32k transactions.  However, there is an
+	 * exception where in some extreme conditions we might get different page
+	 * requests added in the same group but we have handled that by switching
+	 * the bank lock, although that is not the most performant way that's not
+	 * the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we clear
+	 * this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first group
+	 * is performing the update and these two groups can perform the group
+	 * update concurrently but it is completely safe as these two leaders are
+	 * operating on completely different SLRU pages and they both are holding
+	 * their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -508,8 +550,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +577,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +627,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,7 +657,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -666,7 +749,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +783,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +798,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +809,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +844,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +858,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +892,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +924,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +935,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1080,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 10e378f846..ed65f2e910 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 65739b2f9c..fd4c7baf6e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index ce589493e4..01ffdf3cb7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 3f2444a37e..90544fb007 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 87082b8f86..33acb60c9d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 315a78cda9..1261af0548 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index f72f2906ce..9e66ecd1ed 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 9175aaabd1..79c419c698 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index bd682d6368..5779a07a95 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -25,6 +25,14 @@
  */
 #define SLRU_BANK_SIZE		16
 
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -65,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -79,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -92,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -165,11 +182,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -199,5 +229,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index b038e599c0..87cb812b84 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index d0fb9444e8..6b084f8dc0 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

#66Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#52)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Dec 12, 2023 at 8:29 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

The problem I see is that the group update mechanism is designed around
contention of the global xact-SLRU control lock; it uses atomics to
coordinate a single queue when the lock is contended. So if we split up
the global SLRU control lock using banks, then multiple processes using
different bank locks might not contend. OK, this is fine, but what
happens if two separate groups of processes encounter contention on two
different bank locks? I think they will both try to update the same
queue, and coordinate access to that *using different bank locks*. I
don't see how can this work correctly.

I suspect the first part of that algorithm, where atomics are used to
create the list without a lock, might work fine. But will each "leader"
process, each of which is potentially using a different bank lock,
coordinate correctly? Maybe this works correctly because only one
process will find the queue head not empty? If this is what happens,
then there needs to be comments about it. Without any explanation,
this seems broken and potentially dangerous, as some transaction commit
bits might become lost given high enough concurrency and bad luck.

I don't want to be dismissive of this concern, but I took a look at
this part of the patch set and I don't see a correctness problem. I
think the idea of the mechanism is that we have a single linked list
in shared memory that can accumulate those waiters. At some point a
process pops the entire list of waiters, which allows a new group of
waiters to begin accumulating. The process that pops the list must
perform the updates for every process in the just-popped list without
failing, else updates would be lost. In theory, there can be any
number of lists that some process has popped and is currently working
its way through at the same time, although in practice I bet it's
quite rare for there to be more than one. The only correctness problem
is if it's possible for a process that popped the list to error out
before it finishes doing the updates that it "promised" to do by
popping the list.

Having individual bank locks doesn't really change anything here.
That's just a matter of what lock has to be held in order to perform
the update that was promised, and the algorithm described in the
previous paragraph doesn't really care about that. Nor is there a
deadlock hazard here as long as processes only take one lock at a
time, which I believe is the case. The only potential issue that I see
here is with performance. I've heard some questions about whether this
machinery performs well even as it stands, but certainly if we divide
up the lock into a bunch of bankwise locks then that should tend in
the direction of making a mechanism like this less valuable, because
both mechanisms are trying to alleviate contention, and so in a
certain sense they are competing for the same job. However, they do
aim to alleviate different TYPES of contention: the group XID update
stuff should be most valuable when lots of processes are trying to
update the same page, and the banks should be most valuable when there
is simultaneous access to a bunch of different pages. So I'm not
convinced that this patch is a reason to remove the group XID update
mechanism, but someone might argue otherwise.

A related concern is that, if by chance we do end up with multiple
updaters from different pages in the same group, it will now be more
expensive to sort that out because we'll have to potentially keep
switching locks. So that could make the group XID update mechanism
less performant than it is currently. I think that's probably not an
issue because I think it should be a rare occurrence, as the comments
say. A bit more cost in a code path that is almost never taken won't
matter. But if that path is more commonly taken than I think, then
maybe making it more expensive could hurt. It might be worth adding
some debugging to see how often we actually go down that path in a
highly stressed system.

BTW:

+ * as well. The main reason for now allowing requesters of
different pages

now -> not

--
Robert Haas
EDB: http://www.enterprisedb.com

#67Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#66)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Dec 18, 2023 at 12:04 PM Robert Haas <robertmhaas@gmail.com> wrote:

certain sense they are competing for the same job. However, they do
aim to alleviate different TYPES of contention: the group XID update
stuff should be most valuable when lots of processes are trying to
update the same page, and the banks should be most valuable when there
is simultaneous access to a bunch of different pages. So I'm not
convinced that this patch is a reason to remove the group XID update
mechanism, but someone might argue otherwise.

Hmm, but, on the other hand:

Currently all readers and writers are competing for the same LWLock.
But with this change, the readers will (mostly) no longer be competing
with the writers. So, in theory, that might reduce lock contention
enough to make the group update mechanism pointless.

--
Robert Haas
EDB: http://www.enterprisedb.com

#68Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Robert Haas (#67)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 18 Dec 2023, at 22:30, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 18, 2023 at 12:04 PM Robert Haas <robertmhaas@gmail.com> wrote:

certain sense they are competing for the same job. However, they do
aim to alleviate different TYPES of contention: the group XID update
stuff should be most valuable when lots of processes are trying to
update the same page, and the banks should be most valuable when there
is simultaneous access to a bunch of different pages. So I'm not
convinced that this patch is a reason to remove the group XID update
mechanism, but someone might argue otherwise.

Hmm, but, on the other hand:

Currently all readers and writers are competing for the same LWLock.
But with this change, the readers will (mostly) no longer be competing
with the writers. So, in theory, that might reduce lock contention
enough to make the group update mechanism pointless.

One page still accommodates 32K transaction statuses under one lock. It feels like a lot. About 1 second of transactions on a typical installation.

When the group commit was committed did we have a benchmark to estimate efficiency of this technology? Can we repeat that test again?

Best regards, Andrey Borodin.

#69Robert Haas
robertmhaas@gmail.com
In reply to: Andrey M. Borodin (#68)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Dec 18, 2023 at 12:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

One page still accommodates 32K transaction statuses under one lock. It feels like a lot. About 1 second of transactions on a typical installation.

When the group commit was committed did we have a benchmark to estimate efficiency of this technology? Can we repeat that test again?

I think we did, but it might take some research to find it in the
archives. If we can, I agree that repeating it feels like a good idea.

--
Robert Haas
EDB: http://www.enterprisedb.com

#70Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#67)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Dec 18, 2023 at 11:00 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 18, 2023 at 12:04 PM Robert Haas <robertmhaas@gmail.com> wrote:

certain sense they are competing for the same job. However, they do
aim to alleviate different TYPES of contention: the group XID update
stuff should be most valuable when lots of processes are trying to
update the same page, and the banks should be most valuable when there
is simultaneous access to a bunch of different pages. So I'm not
convinced that this patch is a reason to remove the group XID update
mechanism, but someone might argue otherwise.

Hmm, but, on the other hand:

Currently all readers and writers are competing for the same LWLock.
But with this change, the readers will (mostly) no longer be competing
with the writers. So, in theory, that might reduce lock contention
enough to make the group update mechanism pointless.

Thanks for your feedback, I agree that with a bank-wise lock, we might
not need group updates for some of the use cases as you said where
readers and writers are contenting on the centralized lock because, in
most of the cases, readers will be distributed across different banks.
OTOH there are use cases where the writer commit is the bottleneck (on
SLRU lock) like pgbench simple-update or TPC-B then we will still
benefit by group update. During group update testing we have seen
benefits with such a scenario[1]/messages/by-id/CAFiTN-u-XEzhd=hNGW586fmQwdTy6Qy6_SXe09tNB=gBcVzZ_A@mail.gmail.com with high client counts. So as per
my understanding by distributing the SLRU locks there are scenarios
where we will not need group update anymore but OTOH there are also
scenarios where we will still benefit from the group update.

[1]: /messages/by-id/CAFiTN-u-XEzhd=hNGW586fmQwdTy6Qy6_SXe09tNB=gBcVzZ_A@mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#71Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#70)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 19 Dec 2023, at 10:34, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Just a side node.
It seems like commit log is kind of antipattern of data contention. Even when we build a super-optimized SLRU. Nearby **bits** are written by different CPUs.
I think that banks and locks are good thing. But also we could reorganize log so that
status of transaction 0 is on a page 0 at bit offset 0
status of transaction 1 is on a page 1 at bit offset 0
status of transaction 2 is on a page 2 at bit offset 0
status of transaction 3 is on a page 3 at bit offset 0
status of transaction 4 is on a page 0 at bit offset 2
status of transaction 5 is on a page 1 at bit offset 2
status of transaction 6 is on a page 2 at bit offset 2
status of transaction 7 is on a page 3 at bit offset 2
etc...

And it would be even better if page for transaction statuses would be determined by backend id somehow. Or at least cache line. Can we allocate a range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each backend?

This does not matter much because
0. Patch set in current thread produces robust SLRU anyway
1. One day we are going to throw away SLRU anyway

Best regards, Andrey Borodin.

#72Robert Haas
robertmhaas@gmail.com
In reply to: Andrey M. Borodin (#71)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Dec 22, 2023 at 8:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Just a side node.
It seems like commit log is kind of antipattern of data contention. Even when we build a super-optimized SLRU. Nearby **bits** are written by different CPUs.
I think that banks and locks are good thing. But also we could reorganize log so that
status of transaction 0 is on a page 0 at bit offset 0
status of transaction 1 is on a page 1 at bit offset 0
status of transaction 2 is on a page 2 at bit offset 0
status of transaction 3 is on a page 3 at bit offset 0
status of transaction 4 is on a page 0 at bit offset 2
status of transaction 5 is on a page 1 at bit offset 2
status of transaction 6 is on a page 2 at bit offset 2
status of transaction 7 is on a page 3 at bit offset 2
etc...

This is an interesting idea. A variant would be to stripe across
cachelines within the same page rather than across pages. If we do
stripe across pages as proposed here, we'd probably want to rethink
the way the SLRU is extended -- page at a time wouldn't really make
sense, but preallocating an entire file might.

And it would be even better if page for transaction statuses would be determined by backend id somehow. Or at least cache line. Can we allocate a range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each backend?

I don't understand how this could work. We need to be able to look up
transaction status by XID, not backend ID.

--
Robert Haas
EDB: http://www.enterprisedb.com

#73Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Robert Haas (#72)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2 Jan 2024, at 19:23, Robert Haas <robertmhaas@gmail.com> wrote:

And it would be even better if page for transaction statuses would be determined by backend id somehow. Or at least cache line. Can we allocate a range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each backend?

I don't understand how this could work. We need to be able to look up
transaction status by XID, not backend ID.

When GetNewTransactionId() is called we can reserve 256 xids in backend local memory. This values will be reused by transactions or subtransactions of this backend. Here 256 == sizeof(CacheLine).
This would ensure that different backends touch different cache lines.

But this approach would dramatically increase xid consumption speed on patterns where client reconnects after several transactions. So we can keep unused xids in procarray for future reuse.

I doubt we can find significant performance improvement here, because false cache line sharing cannot be _that_ bad.

Best regards, Andrey Borodin.

#74Robert Haas
robertmhaas@gmail.com
In reply to: Andrey M. Borodin (#73)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Jan 2, 2024 at 1:10 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 2 Jan 2024, at 19:23, Robert Haas <robertmhaas@gmail.com> wrote:

And it would be even better if page for transaction statuses would be determined by backend id somehow. Or at least cache line. Can we allocate a range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each backend?

I don't understand how this could work. We need to be able to look up
transaction status by XID, not backend ID.

When GetNewTransactionId() is called we can reserve 256 xids in backend local memory. This values will be reused by transactions or subtransactions of this backend. Here 256 == sizeof(CacheLine).
This would ensure that different backends touch different cache lines.

But this approach would dramatically increase xid consumption speed on patterns where client reconnects after several transactions. So we can keep unused xids in procarray for future reuse.

I doubt we can find significant performance improvement here, because false cache line sharing cannot be _that_ bad.

Yeah, this seems way too complicated for what we'd potentially gain
from it. An additional problem is that the xmin horizon computation
assumes that XIDs are assigned in monotonically increasing fashion;
breaking that would be messy. And even an occasional leak of XIDs
could precipitate enough additional vacuuming to completely outweigh
any gains we could hope to achieve here.

--
Robert Haas
EDB: http://www.enterprisedb.com

#75Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#72)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Jan 2, 2024 at 7:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 22, 2023 at 8:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Just a side node.
It seems like commit log is kind of antipattern of data contention. Even when we build a super-optimized SLRU. Nearby **bits** are written by different CPUs.
I think that banks and locks are good thing. But also we could reorganize log so that
status of transaction 0 is on a page 0 at bit offset 0
status of transaction 1 is on a page 1 at bit offset 0
status of transaction 2 is on a page 2 at bit offset 0
status of transaction 3 is on a page 3 at bit offset 0
status of transaction 4 is on a page 0 at bit offset 2
status of transaction 5 is on a page 1 at bit offset 2
status of transaction 6 is on a page 2 at bit offset 2
status of transaction 7 is on a page 3 at bit offset 2
etc...

This is an interesting idea. A variant would be to stripe across
cachelines within the same page rather than across pages. If we do
stripe across pages as proposed here, we'd probably want to rethink
the way the SLRU is extended -- page at a time wouldn't really make
sense, but preallocating an entire file might.

Yeah, this is indeed an interesting idea. So I think if we are
interested in working in this direction maybe this can be submitted in
a different thread, IMHO.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#76Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#75)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Jan 3, 2024 at 12:08 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Yeah, this is indeed an interesting idea. So I think if we are
interested in working in this direction maybe this can be submitted in
a different thread, IMHO.

Yeah, that's something quite different from the patch before us.

--
Robert Haas
EDB: http://www.enterprisedb.com

#77Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#51)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

The more I look at TransactionGroupUpdateXidStatus, the more I think
it's broken, and while we do have some tests, I don't have confidence
that they cover all possible cases.

Or, at least, if this code is good, then it hasn't been sufficiently
explained.

If we have multiple processes trying to write bits to clog, and they are
using different banks, then the LWLockConditionalAcquire will be able to
acquire the bank lock

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
than society gathers wisdom." (Isaac Asimov)

#78Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#77)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Jan 8, 2024 at 4:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

The more I look at TransactionGroupUpdateXidStatus, the more I think
it's broken, and while we do have some tests, I don't have confidence
that they cover all possible cases.

Or, at least, if this code is good, then it hasn't been sufficiently
explained.

Any thought about a case in which you think it might be broken, I mean
any vague thought might also help where you think it might not work as
expected so that I can also think in that direction. It might be
possible that I might not be thinking of some perspective that you are
thinking and comments might be lacking from that point of view.

If we have multiple processes trying to write bits to clog, and they are
using different banks, then the LWLockConditionalAcquire will be able to
acquire the bank lock

Do you think there is a problem with multiple processes getting the
lock? I mean they are modifying different CLOG pages so that can be
done concurrently right?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#79Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#78)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Jan-08, Dilip Kumar wrote:

On Mon, Jan 8, 2024 at 4:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

The more I look at TransactionGroupUpdateXidStatus, the more I think
it's broken, and while we do have some tests, I don't have confidence
that they cover all possible cases.

Or, at least, if this code is good, then it hasn't been sufficiently
explained.

Any thought about a case in which you think it might be broken, I mean
any vague thought might also help where you think it might not work as
expected so that I can also think in that direction. It might be
possible that I might not be thinking of some perspective that you are
thinking and comments might be lacking from that point of view.

Eh, apologies. This email was an unfinished draft that I had laying
around before the holidays which I intended to discard it but somehow
kept around, and just now I happened to press the wrong key combination
and it ended up being sent instead. We had some further discussion,
after which I no longer think that there is a problem here, so please
ignore this email.

I'll come back to this patch later this week.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")

#80Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#79)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Jan 8, 2024 at 9:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Eh, apologies. This email was an unfinished draft that I had laying
around before the holidays which I intended to discard it but somehow
kept around, and just now I happened to press the wrong key combination
and it ended up being sent instead. We had some further discussion,
after which I no longer think that there is a problem here, so please
ignore this email.

I'll come back to this patch later this week.

No problem

The patch was facing some compilation issues after some recent
commits, so I have changed it. Reported by Julien Tachoires (offlist)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v13-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v13-0002-Divide-SLRU-buffers-into-banks.patchDownload
From 5fc4579e34fc417129e697542b4ac71c93523d1c Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v13 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 15 +++++++++
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 141 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 5d96195c53..7d349d2213 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 27aab51162..41337471e2 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1957845f58..f8eceeac30 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..211527b075 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6059999a3c..82243c2728 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 20a4dfec2a..9059c0a202 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2378,3 +2379,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7fc34720bf..10c51e2883 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index e56c14b78f..8aad05eaf5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2319,7 +2319,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2330,7 +2330,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2341,7 +2341,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2351,7 +2351,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2362,7 +2362,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2373,7 +2373,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2384,7 +2384,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 72b30bba7f..2b74e11d42 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,14 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +155,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +198,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index a7bcb6b42a..f458da88ac 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v13-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchapplication/octet-stream; name=v13-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchDownload
From 84d8a0eb7afbdb63ea49925c7c2bb02384a9c418 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 14 Dec 2023 13:43:38 +0530
Subject: [PATCH v13 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c             | 155 ++++++++---
 src/backend/access/transam/commit_ts.c        |  42 +--
 src/backend/access/transam/multixact.c        | 173 +++++++++----
 src/backend/access/transam/slru.c             | 245 +++++++++++++-----
 src/backend/access/transam/subtrans.c         |  58 ++++-
 src/backend/commands/async.c                  |  43 ++-
 src/backend/storage/lmgr/lwlock.c             |  14 +
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  34 +--
 .../utils/activity/wait_event_names.txt       |   8 +-
 src/include/access/slru.h                     |  64 +++--
 src/include/storage/lwlock.h                  |   7 +
 src/test/modules/test_slru/test_slru.c        |  35 +--
 13 files changed, 640 insertions(+), 252 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7d349d2213..d9a18ad35c 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -442,6 +449,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank locks.
+	 * But in the group, we try to only add the requesters who want to update
+	 * the same page i.e. they would be requesting for the same SLRU-bank lock
+	 * as well.  The main reason for now allowing requesters of different pages
+	 * together is 1) Once the leader acquires the lock they don't need to
+	 * fetch multiple pages and do multiple I/O under the same lock 2) The
+	 * leader need not switch the SLRU-bank lock if the different pages are
+	 * from different SLRU banks 3) And the most important reason is that most
+	 * of the time the contention will occur in high concurrent OLTP workload
+	 * is going on and at that time most of the transactions would be generated
+	 * during the same time and most of them would fall in same clog page as
+	 * each page can hold status of 32k transactions.  However, there is an
+	 * exception where in some extreme conditions we might get different page
+	 * requests added in the same group but we have handled that by switching
+	 * the bank lock, although that is not the most performant way that's not
+	 * the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we clear
+	 * this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first group
+	 * is performing the update and these two groups can perform the group
+	 * update concurrently but it is completely safe as these two leaders are
+	 * operating on completely different SLRU pages and they both are holding
+	 * their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -508,8 +550,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +577,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +627,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,7 +657,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -666,7 +749,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +783,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +798,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +809,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +844,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +858,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +892,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +924,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +935,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1080,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 41337471e2..9e932a161b 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f8eceeac30..dbabc187b9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 211527b075..33670f7cfe 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 82243c2728..c55d709846 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9059c0a202..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b4b989ac56..a4f881f34d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -190,6 +190,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"LogicalRepLauncherDSA",
 	/* LWTRANCHE_LAUNCHER_HASH: */
 	"LogicalRepLauncherHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU"
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index d621f5507f..bcfe82a359 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 10c51e2883..ea4392ab15 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index f625473ad4..afa18ce54a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -292,11 +292,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -307,19 +303,17 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SerialControl	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2b74e11d42..46767f6f84 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -25,6 +25,14 @@
  */
 #define SLRU_BANK_SIZE		16
 
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -65,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -79,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -92,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -165,11 +182,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -199,5 +229,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 167ae34208..d6bfcb5aa3 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -207,6 +207,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_PGSTATS_DATA,
 	LWTRANCHE_LAUNCHER_DSA,
 	LWTRANCHE_LAUNCHER_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

v13-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v13-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 5c34c8fd74a9220d2434bf00743ace6f772af615 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v13 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..bb5bae95df 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..5d96195c53 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..27aab51162 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..1957845f58 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..6059999a3c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..20a4dfec2a 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 30efcd554a..2924bae915 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ee5ea1175c..7fc34720bf 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..504293229c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index e53ebc6dc2..e56c14b78f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2310,6 +2311,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b2809c711a..7cd747e3d9 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..72b30bba7f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..6192365bc8 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..a7bcb6b42a 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

#81Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#80)
3 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Jan 10, 2024 at 6:50 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Jan 8, 2024 at 9:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Eh, apologies. This email was an unfinished draft that I had laying
around before the holidays which I intended to discard it but somehow
kept around, and just now I happened to press the wrong key combination
and it ended up being sent instead. We had some further discussion,
after which I no longer think that there is a problem here, so please
ignore this email.

I'll come back to this patch later this week.

No problem

The patch was facing some compilation issues after some recent
commits, so I have changed it. Reported by Julien Tachoires (offlist)

The last patch conflicted with some of the recent commits, so here is
the updated version of the patch, I also noticed that the slur bank
lock wat event details were missing from the wait_event_names.txt file
so added that as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v14-0002-Divide-SLRU-buffers-into-banks.patchapplication/octet-stream; name=v14-0002-Divide-SLRU-buffers-into-banks.patchDownload
From d5ec64733cb1689f9263028e858d8c49ca684814 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:41:50 +0530
Subject: [PATCH v14 2/3] Divide SLRU buffers into banks

As we have made slru buffer pool configurable, we want to
eliminate linear search within whole SLRU buffer pool.  To
do so we divide SLRU buffers into banks. Each bank holds 16
buffers. Each SLRU pageno may reside only in one bank.
Adjacent pagenos reside in different banks. Along with this
also ensure that the number of slru buffers are given in
multiples of bank size.

Andrey M. Borodin and Dilip Kumar based on fedback by Alvaro Herrera
---
 src/backend/access/transam/clog.c      | 10 ++++++
 src/backend/access/transam/commit_ts.c | 10 ++++++
 src/backend/access/transam/multixact.c | 19 +++++++++++
 src/backend/access/transam/slru.c      | 44 +++++++++++++++++++++++---
 src/backend/access/transam/subtrans.c  | 10 ++++++
 src/backend/commands/async.c           | 10 ++++++
 src/backend/storage/lmgr/predicate.c   | 10 ++++++
 src/backend/utils/misc/guc_tables.c    | 14 ++++----
 src/include/access/slru.h              | 15 +++++++++
 src/include/utils/guc_hooks.h          | 11 +++++++
 10 files changed, 141 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 5d96195c53..7d349d2213 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -1029,3 +1030,12 @@ clogsyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(XactCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for xact_buffers
+ */
+bool
+check_xact_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("xact_buffers", newval);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 27aab51162..41337471e2 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -1027,3 +1028,12 @@ committssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(CommitTsCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for commit_ts_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_ts_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1957845f58..f8eceeac30 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -3421,3 +3422,21 @@ multixactmemberssyncfiletag(const FileTag *ftag, char *path)
 {
 	return SlruSyncFileTag(MultiXactMemberCtl, ftag, path);
 }
+
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
\ No newline at end of file
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..211527b075 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -284,7 +285,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -524,12 +529,18 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -1056,9 +1067,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in one
+		 * bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1093,7 +1110,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * multiple pages with the same lru_count.
 		 */
 		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1666,3 +1683,20 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Value upper and lower hard limits are inclusive */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	/* Value does not fall within any allowable range */
+	GUC_check_errdetail("\"%s\" must be in multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6059999a3c..82243c2728 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -383,3 +384,12 @@ SubTransPagePrecedes(int64 page1, int64 page2)
 	return (TransactionIdPrecedes(xid1, xid2) &&
 			TransactionIdPrecedes(xid1, xid2 + SUBTRANS_XACTS_PER_PAGE - 1));
 }
+
+/*
+ * GUC check_hook for subtrans_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtrans_buffers", newval);
+}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 20a4dfec2a..9059c0a202 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -2378,3 +2379,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 7fc34720bf..10c51e2883 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -5012,3 +5013,12 @@ AttachSerializableXact(SerializableXactHandle handle)
 	if (MySerializableXact != InvalidSerializableXact)
 		CreateLocalPredicateLockHash();
 }
+
+/*
+ * GUC check_hook for serial_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serial_buffers", newval);
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ea15f478fc..938ce3cd47 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2329,7 +2329,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_offsets_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_offsets_buffers, NULL, NULL
 	},
 
 	{
@@ -2340,7 +2340,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&multixact_members_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_multixact_members_buffers, NULL, NULL
 	},
 
 	{
@@ -2351,7 +2351,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&subtrans_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_subtrans_buffers, NULL, NULL
 	},
 	{
 		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
@@ -2361,7 +2361,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&notify_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_notify_buffers, NULL, NULL
 	},
 
 	{
@@ -2372,7 +2372,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&serial_buffers,
 		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, NULL
+		check_serial_buffers, NULL, NULL
 	},
 
 	{
@@ -2383,7 +2383,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&xact_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_xact_buffers
+		check_xact_buffers, NULL, show_xact_buffers
 	},
 
 	{
@@ -2394,7 +2394,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&commit_ts_buffers,
 		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
-		NULL, NULL, show_commit_ts_buffers
+		check_commit_ts_buffers, NULL, show_commit_ts_buffers
 	},
 
 	{
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 72b30bba7f..2b74e11d42 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,14 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -147,6 +155,12 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
@@ -184,5 +198,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index a7bcb6b42a..f458da88ac 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -130,6 +130,17 @@ extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
+extern bool check_xact_buffers(int *newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
 extern void assign_synchronous_commit(int newval, void *extra);
 extern void assign_syslog_facility(int newval, void *extra);
-- 
2.39.2 (Apple Git-143)

v14-0001-Make-all-SLRU-buffer-sizes-configurable.patchapplication/octet-stream; name=v14-0001-Make-all-SLRU-buffer-sizes-configurable.patchDownload
From 8b9b7fbd213c664445fea329c64e3fd9c7addaf5 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Thu, 30 Nov 2023 13:32:01 +0530
Subject: [PATCH v14 1/3] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Default sizes are also set to 64 as sizes much larger than the old
limits have been shown to be useful on modern systems.

Patch by Andrey M. Borodin, Dilip Kumar
Reviewed By Anastasia Lubennikova, Tomas Vondra, Alexander Korotkov,
Gilles Darold, Thomas Munro
---
 doc/src/sgml/config.sgml                      | 135 ++++++++++++++++++
 src/backend/access/transam/clog.c             |  19 +--
 src/backend/access/transam/commit_ts.c        |   7 +-
 src/backend/access/transam/multixact.c        |   8 +-
 src/backend/access/transam/subtrans.c         |   5 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/variable.c               |  25 ++++
 src/backend/storage/lmgr/predicate.c          |   4 +-
 src/backend/utils/init/globals.c              |   8 ++
 src/backend/utils/misc/guc_tables.c           |  77 ++++++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |   5 +
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |   2 +
 18 files changed, 293 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..bb5bae95df 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,141 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+    <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-subtrans-buffers" xreflabel="subtrans_buffers">
+      <term><varname>subtrans_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtrans_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-serial-buffers" xreflabel="serial_buffers">
+      <term><varname>serial_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serial_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>64</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-xact-buffers" xreflabel="xact_buffers">
+      <term><varname>xact_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>xact_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 512, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+    <varlistentry id="guc-commit-ts-buffers" xreflabel="commit_ts_buffers">
+      <term><varname>commit_ts_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_ts_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname> / 256, but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..5d96195c53 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -673,23 +673,16 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * By default, we'll use 2MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* Use configured value if provided. */
+	if (xact_buffers > 0)
+		return Max(16, xact_buffers);
+	return Min(SLRU_MAX_ALLOWED_BUFFERS, Max(16, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..27aab51162 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -502,11 +502,16 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
  * We use a very similar logic as for the number of CLOG buffers (except we
  * scale up twice as fast with shared buffers, and the maximum is twice as
  * high); see comments in CLOGShmemBuffers.
+ * By default, we'll use 4MB of for every 1GB of shared buffers, up to the
+ * maximum value that slru.c will allow, but always at least 16 buffers.
  */
 Size
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* Use configured value if provided. */
+	if (commit_ts_buffers > 0)
+		return Max(16, commit_ts_buffers);
+	return Min(256, Max(16, NBuffers / 256));
 }
 
 /*
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..1957845f58 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1834,8 +1834,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "MultiXactMember", multixact_members_buffers, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..6059999a3c 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,6 +31,7 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
 #include "utils/snapmgr.h"
 
@@ -193,14 +194,14 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(subtrans_buffers, 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..20a4dfec2a 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -234,7 +234,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -492,7 +492,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 30efcd554a..2924bae915 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -18,6 +18,8 @@
 
 #include <ctype.h>
 
+#include "access/clog.h"
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/xact.h"
@@ -400,6 +402,29 @@ show_timezone(void)
 	return "unknown";
 }
 
+/*
+ * GUC show_hook for xact_buffers
+ */
+const char *
+show_xact_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CLOGShmemBuffers());
+	return nbuf;
+}
+
+/*
+ * GUC show_hook for commit_ts_buffers
+ */
+const char *
+show_commit_ts_buffers(void)
+{
+	static char nbuf[16];
+
+	snprintf(nbuf, sizeof(nbuf), "%zu", CommitTsShmemBuffers());
+	return nbuf;
+}
 
 /*
  * LOG_TIMEZONE
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ee5ea1175c..7fc34720bf 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -808,7 +808,7 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
+				  serial_buffers, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
 #ifdef USE_ASSERT_CHECKING
@@ -1348,7 +1348,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serial_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..504293229c 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,11 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			multixact_offsets_buffers = 64;
+int			multixact_members_buffers = 64;
+int			subtrans_buffers = 64;
+int			notify_buffers = 64;
+int			serial_buffers = 64;
+int			xact_buffers = 64;
+int			commit_ts_buffers = 64;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7fe58518d7..ea15f478fc 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2320,6 +2321,82 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact offset SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the MultiXact member SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"subtrans_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the sub-transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtrans_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the NOTIFY message SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"serial_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the serializable transaction SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serial_buffers,
+		64, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"xact_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the number of shared memory buffers used for the transaction status SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&xact_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_xact_buffers
+	},
+
+	{
+		{"commit_ts_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp SLRU cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_ts_buffers,
+		64, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		NULL, NULL, show_commit_ts_buffers
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index da10b43dac..31e8f7fbf7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#xact_buffers = 0			# memory for pg_xact (0 = auto)
+#subtrans_buffers = 64			# memory for pg_subtrans
+#multixact_offsets_buffers = 64		# memory for pg_multixact/offsets
+#multixact_members_buffers = 64		# memory for pg_multixact/members
+#notify_buffers = 64			# memory for pg_notify
+#serial_buffers = 64			# memory for pg_serial
+#commit_ts_buffers = 0			# memory for pg_commit_ts (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..72b30bba7f 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..6192365bc8 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -177,6 +177,13 @@ extern PGDLLIMPORT int MaxBackends;
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int subtrans_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serial_buffers;
+extern PGDLLIMPORT int xact_buffers;
+extern PGDLLIMPORT int commit_ts_buffers;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..a7bcb6b42a 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -163,4 +163,6 @@ extern void assign_wal_consistency_checking(const char *newval, void *extra);
 extern bool check_wal_segment_size(int *newval, void **extra, GucSource source);
 extern void assign_wal_sync_method(int new_wal_sync_method, void *extra);
 
+extern const char *show_xact_buffers(void);
+extern const char *show_commit_ts_buffers(void);
 #endif							/* GUC_HOOKS_H */
-- 
2.39.2 (Apple Git-143)

v14-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchapplication/octet-stream; name=v14-0003-Remove-the-centralized-control-lock-and-LRU-coun.patchDownload
From 16f1d07e62e0487345b2e07b614fb8e406a729af Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Tue, 23 Jan 2024 10:41:26 +0530
Subject: [PATCH v14 3/3] Remove the centralized control lock and LRU counter

The previous patch has divided SLRU buffer pool into associative
banks.  This patch is further optimizing it by introducing
multiple SLRU locks instead of a common centralized lock this
will reduce the contention on the slru control lock. Basically,
we will have at max 128 bank locks and if the number of banks
is <= 128 then each lock will cover exactly one bank otherwise
they will cover multiple banks we will find the bank-to-lock
mapping by (bankno % 128).  This patch also removes the
centralized lru counter and now we will have bank-wise lru
counters that will help in frequent cache invalidation while
modifying this counter.

Dilip Kumar based on design inputs from Robert Haas, Andrey M. Borodin,
and Alvaro Herrera
---
 src/backend/access/transam/clog.c             | 155 ++++++++---
 src/backend/access/transam/commit_ts.c        |  42 +--
 src/backend/access/transam/multixact.c        | 173 +++++++++----
 src/backend/access/transam/slru.c             | 245 +++++++++++++-----
 src/backend/access/transam/subtrans.c         |  58 ++++-
 src/backend/commands/async.c                  |  43 ++-
 src/backend/storage/lmgr/lwlock.c             |  14 +
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  34 +--
 .../utils/activity/wait_event_names.txt       |  15 +-
 src/include/access/slru.h                     |  64 +++--
 src/include/storage/lwlock.h                  |   7 +
 src/test/modules/test_slru/test_slru.c        |  35 +--
 13 files changed, 647 insertions(+), 252 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 7d349d2213..d9a18ad35c 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -285,15 +285,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on Xact SLRU lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * updates for multiple backends so that the number of times the Xact SLRU
+	 * lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -311,17 +316,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
+		 * If we can immediately acquire SLRU lock, we update the status of
 		 * our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
 		 * lock to perform an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -334,10 +339,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -356,7 +361,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -407,14 +413,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -428,6 +433,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
@@ -442,6 +449,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank locks.
+	 * But in the group, we try to only add the requesters who want to update
+	 * the same page i.e. they would be requesting for the same SLRU-bank lock
+	 * as well.  The main reason for now allowing requesters of different pages
+	 * together is 1) Once the leader acquires the lock they don't need to
+	 * fetch multiple pages and do multiple I/O under the same lock 2) The
+	 * leader need not switch the SLRU-bank lock if the different pages are
+	 * from different SLRU banks 3) And the most important reason is that most
+	 * of the time the contention will occur in high concurrent OLTP workload
+	 * is going on and at that time most of the transactions would be generated
+	 * during the same time and most of them would fall in same clog page as
+	 * each page can hold status of 32k transactions.  However, there is an
+	 * exception where in some extreme conditions we might get different page
+	 * requests added in the same group but we have handled that by switching
+	 * the bank lock, although that is not the most performant way that's not
+	 * the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we clear
+	 * this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first group
+	 * is performing the update and these two groups can perform the group
+	 * update concurrently but it is completely safe as these two leaders are
+	 * operating on completely different SLRU pages and they both are holding
+	 * their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -508,8 +550,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -526,6 +577,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,7 +627,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -574,7 +657,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Must be called with slot specific SLRU bank's lock held
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -666,7 +749,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -700,8 +783,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
@@ -715,8 +798,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -725,7 +809,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -760,14 +844,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -778,8 +858,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -811,7 +892,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -843,6 +924,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -853,13 +935,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -997,16 +1080,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 41337471e2..9e932a161b 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -228,8 +228,9 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 {
 	int			slotno;
 	int			i;
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -239,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Must be called with slot specific SLRU bank's Lock held
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -346,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -536,8 +537,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -695,9 +696,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -724,12 +723,13 @@ ActivateCommitTs(void)
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
 		int			slotno;
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -778,9 +778,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -812,6 +812,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -832,12 +833,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -991,16 +994,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1012,7 +1017,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f8eceeac30..dbabc187b9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -193,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -871,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -892,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -917,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -937,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1240,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1343,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1380,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1389,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1398,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1420,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1443,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1853,15 +1910,15 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
 				  "MultiXactOffset", multixact_offsets_buffers, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "MultiXactMember", multixact_members_buffers, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1897,8 +1954,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1907,9 +1966,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1918,7 +1978,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1978,10 +2038,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1996,7 +2058,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2018,13 +2080,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2049,13 +2113,13 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2070,7 +2134,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2078,18 +2144,17 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2101,7 +2166,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2116,10 +2183,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2407,6 +2473,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2417,13 +2484,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2456,15 +2524,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock     *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2762,7 +2832,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3244,31 +3314,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock     *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3334,7 +3408,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 211527b075..33670f7cfe 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -97,6 +97,21 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of lock for a given slotno in bank_lock array in
+ * SlruSharedData.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -118,34 +133,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +160,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -183,6 +171,8 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -192,6 +182,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +200,19 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -229,18 +224,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +254,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -272,7 +269,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,6 +278,15 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
@@ -335,7 +341,7 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -374,12 +380,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -430,10 +437,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -476,9 +487,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,7 +499,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,9 +543,10 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	int			slotno;
 	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
 	/*
 	 * See if the page is already in a buffer pool.  The buffer pool is
@@ -557,8 +570,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -580,6 +593,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -608,7 +622,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -623,7 +637,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1067,13 +1081,14 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
 		/*
 		 * See if the page is already in a buffer pool.  The buffer pool is
-		 * divided into banks of buffers and each pageno may reside only in one
-		 * bank so limit the search within the bank.
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
 		 */
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
@@ -1109,7 +1124,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
 		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
@@ -1131,7 +1146,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1205,6 +1221,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1215,10 +1232,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1232,7 +1262,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1272,6 +1302,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1282,25 +1313,38 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
 	 * While we are holding the lock, make an important safety check: the
 	 * current endpoint page must not be eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1374,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1374,15 +1420,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1476,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1684,6 +1744,37 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	return result;
 }
 
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
 /*
  * Helper function for GUC check_hook to check whether slru buffers are in
  * multiples of SLRU_BANK_SIZE.
@@ -1700,3 +1791,37 @@ check_slru_buffers(const char *name, int *newval)
 						SLRU_BANK_SIZE);
 	return false;
 }
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 82243c2728..c55d709846 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -87,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -110,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -140,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -203,9 +205,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "Subtrans", subtrans_buffers, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
@@ -223,8 +224,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -233,7 +235,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -263,6 +265,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock     *prevlock;
+	LWLock     *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -270,23 +274,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -320,6 +348,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock     *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -331,12 +360,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9059c0a202..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -267,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -543,7 +544,7 @@ AsyncShmemInit(void)
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1357,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1367,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1390,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1435,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1461,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1932,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1954,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1995,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2148,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f2de5a562..b2433b6f21 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -194,6 +194,20 @@ static const char *const BuiltinTrancheNames[] = {
 	"DSMRegistryDSA",
 	/* LWTRANCHE_DSM_REGISTRY_HASH: */
 	"DSMRegistryHash",
+	/* LWTRANCHE_XACT_SLRU: */
+	"XactSLRU",
+	/* LWTRANCHE_SUBTRANS_SLRU: */
+	"SubtransSLRU",
+	/* LWTRANCHE_COMMITTS_SLRU: */
+	"CommitTSSLRU",
+	/* LWTRANCHE_MULTIXACTOFFSET_SLRU: */
+	"MultixactOffsetSLRU",
+	/* LWTRANCHE_MULTIXACTMEMBER_SLRU: */
+	"MultixactMemberSLRU",
+	/* LWTRANCHE_NOTIFY_SLRU: */
+	"NotifySLRU",
+	/* LWTRANCHE_SERIAL_SLRU: */
+	"SerialSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index a0163b2187..e4aa3d91c6 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 10c51e2883..ea4392ab15 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -809,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  serial_buffers, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serial_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -848,12 +848,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -903,7 +905,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -921,10 +923,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -936,13 +938,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -955,7 +957,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -967,7 +969,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -985,7 +987,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -994,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1008,12 +1010,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1073,7 +1075,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index a5df835dd4..e6235d5056 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -292,11 +292,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -307,19 +303,17 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SerialControl	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -371,6 +365,13 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 
 #
 # Wait Events - Lock
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2b74e11d42..46767f6f84 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -25,6 +25,14 @@
  */
 #define SLRU_BANK_SIZE		16
 
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
 /*
  * To avoid overflowing internal arithmetic and the size_t data type, the
  * number of buffers should not exceed this number.
@@ -65,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -79,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -92,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -165,11 +182,24 @@ typedef struct SlruCtlData
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int		banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -199,5 +229,7 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
 extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..408b5dd19a 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_XACT_SLRU,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2 (Apple Git-143)

#82Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#81)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Here's a touched-up version of this patch.

First, PROC_GLOBAL->clogGroupFirst and SlruCtl->latest_page_number
change from being protected by locks to being atomics, but there's no
mention of considering memory barriers that they should have. Looking
at the code, I think the former doesn't need any additional barriers,
but latest_page_number is missing some, which I have added. This
deserves another careful look.

Second and most user visibly, the GUC names seem to have been chosen
based on the source-code variables, which have never meant to be user-
visible. So I renamed a few:

xact_buffers -> transaction_buffers
subtrans_buffers -> subtransaction_buffers
serial_buffers -> serializable_buffers
commit_ts_buffers -> commit_timestamp_buffers

(unchanged: multixact_offsets_buffers, multixact_members_buffers,
notify_buffers)

I did this explicitly trying to avoid using the term SLRU in a
user-visible manner, because what do users care? But immediately after
doing this I realized that we already have pg_stat_slru, so maybe the
cat is already out of the bag, and so perhaps we should name these GUCS
as, say slru_transaction_buffers? That may make the connection between
these things a little more explicit. (I do think we need to add
cross-links in the documentation of those GUCs to the pg_stat_slru
docs and vice-versa.)

Another thing that bothered me a bit is that we have auto-tuning for
transaction_buffers and commit_timestamp_buffers, but not for
subtransaction_buffers. (Autotuning means you set the GUC to 0 and it
scales with shared_buffers). I don't quite understand what's the reason
for the ommision, so I added it for subtrans too. I think it may make
sense to do likewise for the multixact ones too, not sure. It doesn't
seem worth having that for pg_serial and notify.

While messing about with these GUCs I realized that the usage of the
show_hook to print what the current number is, when autoturning is used,
was bogus: SHOW would print the number of blocks for (say)
transaction_buffers, but if you asked it to print (say)
multixact_offsets_buffers, it would give a size in MB or kB. I'm sure
such an inconsistency would bite us. So, digging around I found that a
good way to handle this is to remove the show_hook, and instead call
SetConfigOption() at the time when the ShmemInit function is called,
with the correct number of buffers determined. This is pretty much what
is used for XLOGbuffers, and it works correctly as far as my testing
shows.

Still with these auto-tuning GUCs, I noticed that the auto-tuning code
would continue to grow the buffer sizes with shared_buffers to
arbitrarily large values. I added an arbitrary maximum of 1024 (8 MB),
which is much higher than the current value of 128; but if you have
(say) 30 GB of shared_buffers (not uncommon these days), do you really
need 30MB of pg_clog cache? It seems mostly unnecessary ... and you can
still set it manually that way if you need it. So, largely I just
rewrote those small functions completely.

I also made the SGML documentation and postgresql.sample.conf all match
what the code actually docs. The whole thing wasn't particularly
consistent.

I rewrote a bunch of code comments and moved stuff around to appear in
alphabetical order, etc.

More comment rewriting still pending.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachments:

v15-slru-optimization.patchtext/x-diff; charset=utf-8Download
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..fb22bc2068 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +63,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,14 +294,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on the bank lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the bank lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -310,17 +325,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +348,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +370,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +422,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -427,13 +442,15 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -441,6 +458,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank
+	 * locks. But in the group, we try to only add the requesters who want to
+	 * update the same page i.e. they would be requesting for the same
+	 * SLRU-bank lock as well.  The main reason for now allowing requesters of
+	 * different pages together is 1) Once the leader acquires the lock they
+	 * don't need to fetch multiple pages and do multiple I/O under the same
+	 * lock 2) The leader need not switch the SLRU-bank lock if the different
+	 * pages are from different SLRU banks 3) And the most important reason is
+	 * that most of the time the contention will occur in high concurrent OLTP
+	 * workload is going on and at that time most of the transactions would be
+	 * generated during the same time and most of them would fall in same clog
+	 * page as each page can hold status of 32k transactions.  However, there
+	 * is an exception where in some extreme conditions we might get different
+	 * page requests added in the same group but we have handled that by
+	 * switching the bank lock, although that is not the most performant way
+	 * that's not the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we
+	 * clear this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first
+	 * group is performing the update and these two groups can perform the
+	 * group update concurrently but it is completely safe as these two
+	 * leaders are operating on completely different SLRU pages and they both
+	 * are holding their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -507,8 +559,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -525,6 +586,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -544,7 +636,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -573,7 +666,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -584,6 +677,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -665,7 +763,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -673,23 +771,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -704,13 +797,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -721,8 +837,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -731,7 +848,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -766,14 +883,8 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
-	/*
-	 * Initialize our idea of the latest page number.
-	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	/* Initialize our idea of the latest page number. */
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -784,8 +895,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -817,7 +929,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -849,6 +961,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -859,13 +972,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1003,16 +1117,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..69d359e0fa 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,22 +240,25 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
 						 RepOriginId nodeid, int slotno)
 {
-	int			entryno = TransactionIdToCTsEntry(xid);
+	int			entryno;
 	CommitTimestampEntry entry;
 
-	Assert(TransactionIdIsNormal(xid));
+	if (!TransactionIdIsNormal(xid))
+		return;
+
+	entryno = TransactionIdToCTsEntry(xid);
 
 	entry.time = ts;
 	entry.nodeid = nodeid;
@@ -345,7 +350,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +504,18 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +537,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +576,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -689,9 +721,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_init_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -717,13 +747,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -772,9 +803,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -806,6 +837,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -826,12 +858,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -985,16 +1019,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1006,7 +1042,9 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
+		pg_write_barrier();
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..67b6a6b20a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2017,13 +2098,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2048,13 +2131,14 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
+	pg_write_barrier();
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2069,7 +2153,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2077,18 +2163,18 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
+	pg_write_barrier();
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2100,7 +2186,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2115,10 +2203,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2406,6 +2493,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2416,13 +2504,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2455,15 +2544,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2761,7 +2852,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3243,31 +3334,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3333,7 +3428,9 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
+		pg_write_barrier();
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..ff3c2d7eec 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -96,6 +97,20 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of the lock for the given slot.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -117,34 +132,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -172,6 +159,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,6 +170,10 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -191,6 +183,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -207,16 +201,21 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -228,18 +227,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -260,6 +257,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -271,7 +272,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -280,11 +281,23 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +306,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -330,7 +344,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
+	pg_write_barrier();
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -369,12 +384,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -425,10 +441,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -471,9 +491,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -482,7 +503,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -524,12 +545,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -546,8 +574,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -569,6 +597,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -597,7 +626,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -612,7 +641,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1056,9 +1085,16 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1092,8 +1128,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1114,7 +1150,10 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+
+			pg_read_barrier();
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1188,6 +1227,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1198,10 +1238,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1215,7 +1268,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1255,6 +1308,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1265,25 +1319,39 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
-	 * While we are holding the lock, make an important safety check: the
-	 * current endpoint page must not be eligible for removal.
+	 * An important safety check: the current endpoint page must not be
+	 * eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	pg_read_barrier();
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1313,10 +1381,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1357,15 +1427,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1399,7 +1483,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1666,3 +1750,84 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..cee850c9f8 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,22 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +213,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +263,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +274,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +304,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +313,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +387,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +399,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 98fa6035cc..4a5e05d5e4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU"
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index a0163b2187..e4aa3d91c6 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ee5ea1175c..7fd1bca7f9 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -808,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -834,6 +835,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -847,12 +857,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +914,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +932,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +947,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +966,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +978,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +1005,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1019,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1084,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
@@ -1348,7 +1360,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index a5df835dd4..f24950c0c1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -292,11 +292,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -307,19 +303,17 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SerialControl	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -371,6 +365,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..7df342c70d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7fe58518d7..82d08647d0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2320,6 +2321,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 8, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 8, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index da10b43dac..8b3a547a5e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..3160980d04 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,27 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank; otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -52,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -66,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -79,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -142,15 +172,35 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			banklockno;
+
+	banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -179,5 +229,8 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..39b8ed9425 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -178,6 +178,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..44b0cbf9a1 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
#83Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#82)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Jan-25, Alvaro Herrera wrote:

Here's a touched-up version of this patch.

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 98fa6035cc..4a5e05d5e4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU"
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
};

Eeek. Last minute changes ... Fixed here.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Attachments:

v16-slru-optimization.patchtext/x-diff; charset=utf-8Download
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..fb22bc2068 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +63,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,14 +294,19 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
+	 * When there is contention on the bank lock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
+	 * updates for multiple backends so that the number of times the bank lock
 	 * needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
@@ -310,17 +325,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +348,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +370,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +422,13 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -427,13 +442,15 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -441,6 +458,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank
+	 * locks. But in the group, we try to only add the requesters who want to
+	 * update the same page i.e. they would be requesting for the same
+	 * SLRU-bank lock as well.  The main reason for now allowing requesters of
+	 * different pages together is 1) Once the leader acquires the lock they
+	 * don't need to fetch multiple pages and do multiple I/O under the same
+	 * lock 2) The leader need not switch the SLRU-bank lock if the different
+	 * pages are from different SLRU banks 3) And the most important reason is
+	 * that most of the time the contention will occur in high concurrent OLTP
+	 * workload is going on and at that time most of the transactions would be
+	 * generated during the same time and most of them would fall in same clog
+	 * page as each page can hold status of 32k transactions.  However, there
+	 * is an exception where in some extreme conditions we might get different
+	 * page requests added in the same group but we have handled that by
+	 * switching the bank lock, although that is not the most performant way
+	 * that's not the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we
+	 * clear this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first
+	 * group is performing the update and these two groups can perform the
+	 * group update concurrently but it is completely safe as these two
+	 * leaders are operating on completely different SLRU pages and they both
+	 * are holding their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -507,8 +559,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -525,6 +586,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -544,7 +636,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -573,7 +666,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -584,6 +677,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -665,7 +763,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -673,23 +771,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -704,13 +797,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -721,8 +837,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -731,7 +848,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -766,14 +883,8 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
-	/*
-	 * Initialize our idea of the latest page number.
-	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	/* Initialize our idea of the latest page number. */
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
@@ -784,8 +895,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -817,7 +929,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -849,6 +961,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -859,13 +972,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1003,16 +1117,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..69d359e0fa 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,22 +240,25 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
 						 RepOriginId nodeid, int slotno)
 {
-	int			entryno = TransactionIdToCTsEntry(xid);
+	int			entryno;
 	CommitTimestampEntry entry;
 
-	Assert(TransactionIdIsNormal(xid));
+	if (!TransactionIdIsNormal(xid))
+		return;
+
+	entryno = TransactionIdToCTsEntry(xid);
 
 	entry.time = ts;
 	entry.nodeid = nodeid;
@@ -345,7 +350,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +504,18 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +537,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +576,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -689,9 +721,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_init_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -717,13 +747,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -772,9 +803,9 @@ DeactivateCommitTs(void)
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	SimpleLruAcquireAllBankLock(CommitTsCtl, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+	SimpleLruReleaseAllBankLock(CommitTsCtl);
 }
 
 /*
@@ -806,6 +837,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -826,12 +858,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -985,16 +1019,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
@@ -1006,7 +1042,9 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
+		pg_write_barrier();
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..67b6a6b20a 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2017,13 +2098,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
 }
 
 /*
@@ -2048,13 +2131,14 @@ TrimMultiXact(void)
 	LWLockRelease(MultiXactGenLock);
 
 	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
+	pg_write_barrier();
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2069,7 +2153,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2077,18 +2163,18 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
+	pg_write_barrier();
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -2100,7 +2186,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2115,10 +2203,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2406,6 +2493,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2416,13 +2504,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2455,15 +2544,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2761,7 +2852,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3243,31 +3334,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
@@ -3333,7 +3428,9 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
+		pg_write_barrier();
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..ff3c2d7eec 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -96,6 +97,20 @@ SlruFileName(SlruCtl ctl, char *path, int64 segno)
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * Macro to get the index of the lock for the given slot.
+ *
+ * Basically, the slru buffer pool is divided into banks of buffer and there is
+ * total SLRU_MAX_BANKLOCKS number of locks to protect access to buffer in the
+ * banks.  Since we have max limit on the number of locks we can not always have
+ * one lock for each bank.  So until the number of banks are
+ * <= SLRU_MAX_BANKLOCKS then there would be one lock protecting each bank
+ * otherwise one lock might protect multiple banks based on the number of
+ * banks.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	(((slotno) / SLRU_BANK_SIZE) % SLRU_MAX_BANKLOCKS)
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -117,34 +132,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -172,6 +159,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,6 +170,10 @@ Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
 	Size		sz;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
@@ -191,6 +183,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanklocks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -207,16 +201,21 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+	int			nbanklocks = Min(nbanks, SLRU_MAX_BANKLOCKS);
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -228,18 +227,16 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		char	   *ptr;
 		Size		offset;
 		int			slotno;
+		int			bankno;
+		int			banklockno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -260,6 +257,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanklocks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -271,7 +272,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -280,11 +281,23 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +306,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -330,7 +344,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
+	pg_write_barrier();
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -369,12 +384,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -425,10 +441,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -471,9 +491,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -482,7 +503,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -524,12 +545,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -546,8 +574,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -569,6 +597,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -597,7 +626,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -612,7 +641,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1056,9 +1085,16 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1092,8 +1128,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1114,7 +1150,10 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				this_delta = 0;
 			}
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+
+			pg_read_barrier();
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1188,6 +1227,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1198,10 +1238,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1215,7 +1268,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1255,6 +1308,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1265,25 +1319,39 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
-	 * While we are holding the lock, make an important safety check: the
-	 * current endpoint page must not be eligible for removal.
+	 * An important safety check: the current endpoint page must not be
+	 * eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	pg_read_barrier();
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1313,10 +1381,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1357,15 +1427,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1399,7 +1483,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1666,3 +1750,84 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
+
+/*
+ * Function to acquire all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, mode);
+}
+
+/*
+ * Function to release all bank's lock of the given SlruCtl
+ */
+void
+SimpleLruReleaseAllBankLock(SlruCtl ctl)
+{
+	SlruShared	shared = ctl->shared;
+	int			banklockno;
+	int			nbanklocks;
+
+	/* Compute number of bank locks. */
+	nbanklocks = Min(shared->num_slots / SLRU_BANK_SIZE, SLRU_MAX_BANKLOCKS);
+
+	for (banklockno = 0; banklockno < nbanklocks; banklockno++)
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..cee850c9f8 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,22 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +213,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +263,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +274,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +304,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +313,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +387,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +399,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 98fa6035cc..accbe82a8c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index a0163b2187..e4aa3d91c6 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+SerialControlLock					31
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ee5ea1175c..7fd1bca7f9 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -208,6 +208,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -808,9 +809,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -834,6 +835,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -847,12 +857,14 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * If no serializable transactions are active, there shouldn't be anything
@@ -902,7 +914,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -920,10 +932,10 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
-	LWLockAcquire(SerialSLRULock, LW_SHARED);
+	LWLockAcquire(SerialControlLock, LW_SHARED);
 	headXid = serialControl->headXid;
 	tailXid = serialControl->tailXid;
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	if (!TransactionIdIsValid(headXid))
 		return 0;
@@ -935,13 +947,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -954,7 +966,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 static void
 SerialSetActiveSerXmin(TransactionId xid)
 {
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/*
 	 * When no sxacts are active, nothing overlaps, set the xid values to
@@ -966,7 +978,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 	{
 		serialControl->tailXid = InvalidTransactionId;
 		serialControl->headXid = InvalidTransactionId;
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -984,7 +996,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 		{
 			serialControl->tailXid = xid;
 		}
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -993,7 +1005,7 @@ SerialSetActiveSerXmin(TransactionId xid)
 
 	serialControl->tailXid = xid;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 }
 
 /*
@@ -1007,12 +1019,12 @@ CheckPointPredicate(void)
 {
 	int			truncateCutoffPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
 	/* Exit quickly if the SLRU is currently not in use. */
 	if (serialControl->headPage < 0)
 	{
-		LWLockRelease(SerialSLRULock);
+		LWLockRelease(SerialControlLock);
 		return;
 	}
 
@@ -1072,7 +1084,7 @@ CheckPointPredicate(void)
 		serialControl->headPage = -1;
 	}
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SerialControlLock);
 
 	/* Truncate away pages that are no longer required */
 	SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
@@ -1348,7 +1360,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index a5df835dd4..f24950c0c1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -292,11 +292,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -307,19 +303,17 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SerialControl	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -371,6 +365,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..7df342c70d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7fe58518d7..82d08647d0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2320,6 +2321,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 8, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 8, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index da10b43dac..8b3a547a5e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..3160980d04 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,27 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank; otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -52,8 +73,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -66,8 +85,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -79,23 +120,12 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
@@ -142,15 +172,35 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			banklockno;
+
+	banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -179,5 +229,8 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
+extern void SimpleLruAcquireAllBankLock(SlruCtl ctl, LWLockMode mode);
+extern void SimpleLruReleaseAllBankLock(SlruCtl ctl);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..39b8ed9425 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -178,6 +178,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..44b0cbf9a1 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
#84Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#82)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Jan 25, 2024 at 11:22 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Still with these auto-tuning GUCs, I noticed that the auto-tuning code
would continue to grow the buffer sizes with shared_buffers to
arbitrarily large values. I added an arbitrary maximum of 1024 (8 MB),
which is much higher than the current value of 128; but if you have
(say) 30 GB of shared_buffers (not uncommon these days), do you really
need 30MB of pg_clog cache? It seems mostly unnecessary ... and you can
still set it manually that way if you need it. So, largely I just
rewrote those small functions completely.

Yeah, I think that if we're going to scale with shared_buffers, it
should be capped.

--
Robert Haas
EDB: http://www.enterprisedb.com

#85Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#81)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

I've continued to review this and decided that I don't like the mess
this patch proposes in order to support pg_commit_ts's deletion of all
files. (Yes, I know that I was the one that proposed this idea. It's
still a bad one). I'd like to change that code by removing the limit
that we can only have 128 bank locks in a SLRU. To recap, the reason we
did this is that commit_ts sometimes wants to delete all files while
running (DeactivateCommitTs), and for this it needs to acquire all bank
locks. Since going above the MAX_SIMUL_LWLOCKS limit is disallowed, we
added the SLRU limit making multiple banks share lwlocks.

I propose two alternative solutions:

1. The easiest is to have DeactivateCommitTs continue to hold
CommitTsLock until the end, including while SlruScanDirectory does its
thing. This sounds terrible, but considering that this code only runs
when the module is being deactivated, I don't think it's really all that
bad in practice. I mean, if you deactivate the commit_ts module and
then try to read commit timestamp data, you deserve to wait for a few
seconds just as a punishment for your stupidity. AFAICT the cases where
anything is done in the module mostly check without locking that
commitTsActive is set, so we're not slowing down any critical
operations. Also, if you don't like to be delayed for a couple of
seconds, just don't deactivate the module.

2. If we want some refinement, the other idea is to change
SlruScanDirCbDeleteAll (the callback that SlruScanDirectory uses in this
case) so that it acquires the bank lock appropriate for all the slots
used by the file that's going to be deleted. This is OK because in the
default compilation each file only has 32 segments, so that requires
only 32 lwlocks held at once while the file is being deleted. I think
we don't need to bother with this, but it's an option if we see the
above as unworkable for whatever reason.

The only other user of SlruScanDirCbDeleteAll is async.c (the LISTEN/
NOTIFY code), and what that does is delete all the files at server
start, where nothing is running concurrently anyway. So whatever we do
for commit_ts, it won't affect async.c.

So, if we do away with the SLRU_MAX_BANKLOCKS idea, then we're assured
one LWLock per bank (instead of sharing some lwlocks to multiple banks),
and that makes the code simpler to reason about.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)

#86Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#85)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 26 Jan 2024, at 22:38, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

This is OK because in the
default compilation each file only has 32 segments, so that requires
only 32 lwlocks held at once while the file is being deleted.

Do we account somehow that different subsystems do not accumulate MAX_SIMUL_LWLOCKS together?
E.g. GiST during split can combine 75 locks, and somehow commit_ts will be deactivated by this backend at the same moment and add 32 locks more :)
I understand that this sounds fantastic, these subsystems do not interfere. But this is fantastic only until something like that actually happens.
If possible, I'd prefer one lock at a time, any maybe sometimes two-three with some guarantees that this is safe.
So, from my POV first solution that you proposed seems much better to me.

Thanks for working on this!

Best regard, Andrey Borodin.

#87Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#83)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Jan 25, 2024 at 10:03 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jan-25, Alvaro Herrera wrote:

Here's a touched-up version of this patch.

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 98fa6035cc..4a5e05d5e4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+     [LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+     [LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+     [LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+     [LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+     [LWTRANCHE_SERIAL_SLRU] = "SerialSLRU"
+     [LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+     [LWTRANCHE_XACT_SLRU] = "XactSLRU",
};

Eeek. Last minute changes ... Fixed here.

Thank you for working on this. There is one thing that I feel is
problematic. We have kept the allowed values for these GUCs to be in
multiple of SLRU_BANK_SIZE i.e. 16 and that's the reason the min
values were changed to 16 but in this refactoring patch for some of
the buffers you have changed that to 8 so I think that's not good.

+ {
+ {"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+ gettext_noop("Sets the size of the dedicated buffer pool used for
the MultiXact offset cache."),
+ NULL,
+ GUC_UNIT_BLOCKS
+ },
+ &multixact_offsets_buffers,
+ 16, 8, SLRU_MAX_ALLOWED_BUFFERS,
+ check_multixact_offsets_buffers, NULL, NULL
+ },

Other than this patch looks good to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#88Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#85)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Jan 26, 2024 at 11:08 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I've continued to review this and decided that I don't like the mess
this patch proposes in order to support pg_commit_ts's deletion of all
files. (Yes, I know that I was the one that proposed this idea. It's
still a bad one). I'd like to change that code by removing the limit
that we can only have 128 bank locks in a SLRU. To recap, the reason we
did this is that commit_ts sometimes wants to delete all files while
running (DeactivateCommitTs), and for this it needs to acquire all bank
locks. Since going above the MAX_SIMUL_LWLOCKS limit is disallowed, we
added the SLRU limit making multiple banks share lwlocks.

I propose two alternative solutions:

1. The easiest is to have DeactivateCommitTs continue to hold
CommitTsLock until the end, including while SlruScanDirectory does its
thing. This sounds terrible, but considering that this code only runs
when the module is being deactivated, I don't think it's really all that
bad in practice. I mean, if you deactivate the commit_ts module and
then try to read commit timestamp data, you deserve to wait for a few
seconds just as a punishment for your stupidity.

I think this idea looks reasonable. I agree that if we are trying to
read commit_ts after deactivating it then it's fine to make it wait.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#89Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#87)
2 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Jan-29, Dilip Kumar wrote:

Thank you for working on this. There is one thing that I feel is
problematic. We have kept the allowed values for these GUCs to be in
multiple of SLRU_BANK_SIZE i.e. 16 and that's the reason the min
values were changed to 16 but in this refactoring patch for some of
the buffers you have changed that to 8 so I think that's not good.

Oh, absolutely, you're right. Restored the minimum to 16.

So, here's the patchset as two pieces. 0001 converts
SlruSharedData->latest_page_number to use atomics. I don't see any
reason to mix this in with the rest of the patch, and though it likely
won't have any performance advantage by itself (since the lock
acquisition is pretty much the same), it seems better to get it in ahead
of the rest -- I think that simplifies matters for the second patch,
which is large enough.

So, 0002 introduces the rest of the feature. I removed use of banklocks
in a different amount as banks, and I made commit_ts use a longer
lwlock acquisition at truncation time, rather than forcing all-lwlock
acquisition.

The more I look at 0002, the more I notice that some comments need badly
updated, so please don't read too much into it yet. But I wanted to
post it anyway for archives and cfbot purposes.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Attachments:

0001-Use-atomics-for-SlruSharedData-latest_page_number.patchtext/x-diff; charset=utf-8Download
From 464a996b85c333ffc781086263c2e491758b248f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 31 Jan 2024 12:27:51 +0100
Subject: [PATCH 1/2] Use atomics for SlruSharedData->latest_page_number

---
 src/backend/access/transam/clog.c      |  7 ++----
 src/backend/access/transam/commit_ts.c |  7 +++---
 src/backend/access/transam/multixact.c | 30 ++++++++++++++++----------
 src/backend/access/transam/slru.c      | 19 ++++++++++------
 src/include/access/slru.h              |  5 ++++-
 5 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..245fd21c8d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -766,14 +766,11 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
+	pg_write_barrier();
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..f68705989d 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -689,9 +689,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_init_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -1006,7 +1004,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..a886c29892 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2017,13 +2017,17 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+					   pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+					   pageno);
+
+	pg_write_barrier();
 }
 
 /*
@@ -2047,14 +2051,15 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
+
+	/* Clean up offsets state */
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2081,14 +2086,16 @@ TrimMultiXact(void)
 
 	LWLockRelease(MultiXactOffsetSLRULock);
 
-	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
+	 * And the same for members.
+	 *
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
+
+	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -3333,7 +3340,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..57949fbab4 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -17,7 +17,8 @@
  * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
  * must be held to examine or modify any shared state.  A process that is
  * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.
+ * only the per-buffer lock for the buffer it is working on.  One exception
+ * is latest_page_number, which is read and written using atomic ops.
  *
  * "Holding the control lock" means exclusive lock in all cases except for
  * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
@@ -330,7 +331,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SimpleLruZeroLSNs(ctl, slotno);
 
 	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
+	pg_write_barrier();
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -1113,8 +1115,11 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				shared->page_lru_count[slotno] = cur_count;
 				this_delta = 0;
 			}
+
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			pg_read_barrier();
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
@@ -1270,10 +1275,12 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 restart:
 
 	/*
-	 * While we are holding the lock, make an important safety check: the
-	 * current endpoint page must not be eligible for removal.
+	 * An important safety check: the current endpoint page must not be
+	 * eligible for removal.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	pg_read_barrier();
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
 		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..2109488654 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -49,6 +49,9 @@ typedef enum
 
 /*
  * Shared-memory state
+ *
+ * ControlLock is used to protect access to the other fields, except
+ * latest_page_number, which uses atomics; see comment in slru.c.
  */
 typedef struct SlruSharedData
 {
@@ -95,7 +98,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
-- 
2.39.2

0002-Enlarge-SLRU-buffer-caches-and-improve-concurrency.patchtext/x-diff; charset=utf-8Download
From 1b7c1f628a5f67260eded1b4b22930913d61f43f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 31 Jan 2024 17:06:19 +0100
Subject: [PATCH 2/2] Enlarge SLRU buffer caches and improve concurrency

---
 doc/src/sgml/config.sgml                      | 139 ++++++++++
 src/backend/access/transam/clog.c             | 224 +++++++++++----
 src/backend/access/transam/commit_ts.c        |  88 ++++--
 src/backend/access/transam/multixact.c        | 190 +++++++++----
 src/backend/access/transam/slru.c             | 257 +++++++++++++-----
 src/backend/access/transam/subtrans.c         | 103 +++++--
 src/backend/commands/async.c                  |  61 +++--
 src/backend/storage/lmgr/lwlock.c             |   9 +-
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  38 ++-
 .../utils/activity/wait_event_names.txt       |  15 +-
 src/backend/utils/init/globals.c              |   9 +
 src/backend/utils/misc/guc_tables.c           |  78 ++++++
 src/backend/utils/misc/postgresql.conf.sample |   9 +
 src/include/access/clog.h                     |   1 -
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |  81 +++++-
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   8 +
 src/include/storage/lwlock.h                  |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |  11 +
 src/test/modules/test_slru/test_slru.c        |  35 +--
 25 files changed, 1088 insertions(+), 306 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 245fd21c8d..01e153ee14 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +63,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,15 +294,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
-	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * When there is contention on the SLRU bank lock we need, we try to group
+	 * multiple updates; a single leader process will perform transaction
+	 * status updates for multiple backends so that the number of times the
+	 * bank lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -310,17 +325,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +348,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +370,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +422,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * Subroutine for TransactionIdSetPageStatus, q.v.
+ *
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -427,13 +444,15 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -441,6 +460,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank
+	 * locks. But in the group, we try to only add the requesters who want to
+	 * update the same page i.e. they would be requesting for the same
+	 * SLRU-bank lock as well.  The main reason for now allowing requesters of
+	 * different pages together is 1) Once the leader acquires the lock they
+	 * don't need to fetch multiple pages and do multiple I/O under the same
+	 * lock 2) The leader need not switch the SLRU-bank lock if the different
+	 * pages are from different SLRU banks 3) And the most important reason is
+	 * that most of the time the contention will occur in high concurrent OLTP
+	 * workload is going on and at that time most of the transactions would be
+	 * generated during the same time and most of them would fall in same clog
+	 * page as each page can hold status of 32k transactions.  However, there
+	 * is an exception where in some extreme conditions we might get different
+	 * page requests added in the same group but we have handled that by
+	 * switching the bank lock, although that is not the most performant way
+	 * that's not the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we
+	 * clear this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first
+	 * group is performing the update and these two groups can perform the
+	 * group update concurrently but it is completely safe as these two
+	 * leaders are operating on completely different SLRU pages and they both
+	 * are holding their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -507,8 +561,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -525,6 +588,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -544,7 +638,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -573,7 +668,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -584,6 +679,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -673,23 +773,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -704,13 +799,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -721,8 +839,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -731,7 +850,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -781,8 +900,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -814,7 +934,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -846,6 +966,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -856,13 +977,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1000,16 +1122,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index f68705989d..9590aa3ea0 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,22 +240,25 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
 						 RepOriginId nodeid, int slotno)
 {
-	int			entryno = TransactionIdToCTsEntry(xid);
+	int			entryno;
 	CommitTimestampEntry entry;
 
-	Assert(TransactionIdIsNormal(xid));
+	if (!TransactionIdIsNormal(xid))
+		return;
+
+	entryno = TransactionIdToCTsEntry(xid);
 
 	entry.time = ts;
 	entry.nodeid = nodeid;
@@ -345,7 +350,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +504,18 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +537,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +576,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -715,13 +747,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -760,8 +793,6 @@ DeactivateCommitTs(void)
 	TransamVariables->oldestCommitTsXid = InvalidTransactionId;
 	TransamVariables->newestCommitTsXid = InvalidTransactionId;
 
-	LWLockRelease(CommitTsLock);
-
 	/*
 	 * Remove *all* files.  This is necessary so that there are no leftover
 	 * files; in the case where this feature is later enabled after running
@@ -769,10 +800,16 @@ DeactivateCommitTs(void)
 	 * (We can probably tolerate out-of-sequence files, as they are going to
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
+	 *
+	 * Note that we do this with CommitTsLock acquired in exclusive mode.
+	 * This is very heavy-handed, but since this routine can only be called
+	 * in the replica and should happen very rarely, we don't worry too much
+	 * about it.  Note also that no process should be consulting this SLRU
+	 * if we have just deactivated it.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+
+	LWLockRelease(CommitTsLock);
 }
 
 /*
@@ -804,6 +841,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -824,12 +862,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -983,16 +1023,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a886c29892..6143fdaaab 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2051,6 +2132,8 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
+	/* Clean up offsets state */
+
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
@@ -2058,9 +2141,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
 						pageno);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
 	 * TrimCLOG() for background.  Unlike CLOG, some WAL record covers every
@@ -2074,7 +2154,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2082,10 +2164,9 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/*
 	 * And the same for members.
 	 *
@@ -2095,8 +2176,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
 						pageno);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
 	 * TrimCLOG() for motivation.
@@ -2107,7 +2186,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2122,10 +2203,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2413,6 +2493,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2423,13 +2504,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2462,15 +2544,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2768,7 +2852,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3250,31 +3334,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 57949fbab4..7974d904ec 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -60,6 +60,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -106,6 +107,12 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+/*
+ * Macro to get the lock index that protects the given slot.
+ */
+#define SLRU_SLOTNO_GET_BANKLOCKNO(slotno) \
+	((slotno) / SLRU_BANK_SIZE)
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -118,34 +125,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +152,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,8 +162,11 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
 Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
+	int			nbanks = nslots / SLRU_BANK_SIZE;
 	Size		sz;
 
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
 	sz += MAXALIGN(nslots * sizeof(char *));	/* page_buffer[] */
@@ -192,6 +175,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +193,20 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -228,19 +217,14 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize locks and shared memory area */
 		char	   *ptr;
 		Size		offset;
-		int			slotno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
-
 		/* shared->latest_page_number will be set later */
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -261,6 +245,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -269,10 +257,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 
 		ptr += BUFFERALIGN(offset);
-		for (slotno = 0; slotno < nslots; slotno++)
+		for (int slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -281,11 +269,23 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the bank locks. */
+		for (int banklockno = 0; banklockno < nbanks; banklockno++)
+			LWLockInitialize(&shared->bank_locks[banklockno].lock,
+							 bank_tranche_id);
+
+		/* Initialize the bank lru counters. */
+		for (int bankno = 0; bankno < nbanks; bankno++)
+			shared->bank_cur_lru_count[bankno] = 0;
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -294,6 +294,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -371,12 +372,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -419,7 +421,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -427,10 +429,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			banklockno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -473,9 +479,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -484,7 +491,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -526,12 +533,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankstart = (pageno & ctl->bank_mask) * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(bankstart);
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -548,8 +562,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -571,6 +585,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
 	bool		ok;
+	int			banklockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
@@ -599,7 +614,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[banklockno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -614,7 +629,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[banklockno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1030,17 +1045,17 @@ SlruReportIOError(SlruCtl ctl, int64 pageno, TransactionId xid)
 }
 
 /*
- * Select the slot to re-use when we need a free slot.
+ * Select the slot to re-use when we need a free slot for the given page.
  *
- * The target page number is passed because we need to consider the
- * possibility that some other process reads in the target page while
- * we are doing I/O to free a slot.  Hence, check or recheck to see if
- * any slot already holds the target page, and return that slot if so.
- * Thus, the returned slot is *either* a slot already holding the pageno
- * (could be any state except EMPTY), *or* a freeable slot (state EMPTY
- * or CLEAN).
+ * The target page number is passed not only because we need to know the
+ * correct bank to use, but also because we need to consider the possibility
+ * that some other process reads in the target page while we are doing I/O to
+ * free a slot.  Hence, check or recheck to see if any slot already holds the
+ * target page, and return that slot if so.  Thus, the returned slot is
+ * *either* a slot already holding the pageno (could be any state except
+ * EMPTY), *or* a freeable slot (state EMPTY or CLEAN).
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 static int
 SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
@@ -1058,9 +1073,18 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
+
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1094,8 +1118,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1117,6 +1141,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 			}
 
 			this_page_number = shared->page_number[slotno];
+
 			pg_read_barrier();
 			if (this_page_number ==
 				pg_atomic_read_u64(&shared->latest_page_number))
@@ -1193,6 +1218,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1203,10 +1229,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1220,7 +1259,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1260,6 +1299,7 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			prevlockno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1270,8 +1310,6 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
@@ -1282,15 +1320,29 @@ restart:
 	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
 						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1320,10 +1372,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevlockno].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1364,15 +1418,29 @@ SlruDeleteSegment(SlruCtl ctl, int64 segno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 	bool		did_write;
+	int			prevlockno = SLRU_SLOTNO_GET_BANKLOCKNO(0);
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevlockno].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curlockno = SLRU_SLOTNO_GET_BANKLOCKNO(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curlockno != prevlockno)
+		{
+			LWLockRelease(&shared->bank_locks[prevlockno].lock);
+			LWLockAcquire(&shared->bank_locks[curlockno].lock, LW_EXCLUSIVE);
+			prevlockno = curlockno;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1406,7 +1474,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevlockno].lock);
 }
 
 /*
@@ -1673,3 +1741,50 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = slotno / SLRU_BANK_SIZE;
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..cee850c9f8 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,22 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return Min(1024, Max(16, NBuffers / 512));
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +213,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +263,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +274,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +304,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +313,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +387,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +399,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 71677cf031..79778b5813 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
@@ -776,7 +783,7 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
  * in mode.
  *
  * This function will not block waiting for a lock to become free - that's the
- * callers job.
+ * caller's job.
  *
  * Returns true if the lock isn't free and we need to wait.
  */
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 3d59d3646e..284d168f77 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+# 31 was SerialSLRULock
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index eed63a05ed..1a7ff92bff 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -137,8 +137,8 @@
  *	SerialControlLock
  *		- Protects SerialControlData members
  *
- *	SerialSLRULock
- *		- Protects SerialSlruCtl
+ *	SLRU bank locks
+ *		- Protect the pg_serial SLRU banks
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -213,6 +213,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -813,9 +814,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -841,6 +842,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -854,15 +864,17 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
 	/*
-	 * In this routine, we must hold both SerialControlLock and SerialSLRULock
-	 * simultaneously while making the SLRU data catch up with the new state
-	 * that we determine.
+	 * In this routine, we must hold both SerialControlLock and the SLRU
+	 * bank lock simultaneously while making the SLRU data catch up with
+	 * the new state that we determine.
 	 */
 	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
@@ -898,7 +910,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (isNewPage)
 	{
@@ -916,7 +928,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 	LWLockRelease(SerialControlLock);
 }
 
@@ -950,13 +962,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -1367,7 +1379,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index cd22dca702..e896f8310c 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -292,11 +292,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -307,19 +303,16 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -372,6 +365,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..7df342c70d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7fe58518d7..502fd51939 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2320,6 +2321,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index da10b43dac..8b3a547a5e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2109488654..19217f7db6 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,27 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * SLRU bank size for slotno hash banks.  Limit the bank size to 16 because we
+ * perform sequential search within a bank (while looking for a target page or
+ * while doing victim buffer search) and if we keep this size big then it may
+ * affect the performance.
+ */
+#define SLRU_BANK_SIZE		16
+
+/*
+ * Number of bank locks to protect the in memory buffer slot access within a
+ * SLRU bank.  If the number of banks are <= SLRU_MAX_BANKLOCKS then there will
+ * be one lock per bank; otherwise each lock will protect multiple banks depends
+ * upon the number of banks.
+ */
+#define	SLRU_MAX_BANKLOCKS	128
+
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -55,8 +76,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -69,8 +88,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -82,17 +123,6 @@ typedef struct SlruSharedData
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
@@ -145,15 +175,35 @@ typedef struct SlruCtlData
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
+	/*
+	 * Mask for slotno banks, considering 1GB SLRU buffer pool size and the
+	 * SLRU_BANK_SIZE bits16 should be sufficient for the bank mask.
+	 */
+	bits16		bank_mask;
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			banklockno;
+
+	banklockno = (pageno & ctl->bank_mask) % SLRU_MAX_BANKLOCKS;
+	return &(ctl->shared->bank_locks[banklockno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -182,5 +232,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..39b8ed9425 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -178,6 +178,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..44b0cbf9a1 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2

#90Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#89)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Hah:

postgres -c lc_messages=C -c shared_buffers=$((512*17))

2024-02-01 10:48:13.548 CET [1535379] FATAL: invalid value for parameter "transaction_buffers": 17
2024-02-01 10:48:13.548 CET [1535379] DETAIL: "transaction_buffers" must be a multiple of 16

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

#91Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#90)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Feb 1, 2024 at 3:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hah:

postgres -c lc_messages=C -c shared_buffers=$((512*17))

2024-02-01 10:48:13.548 CET [1535379] FATAL: invalid value for parameter "transaction_buffers": 17
2024-02-01 10:48:13.548 CET [1535379] DETAIL: "transaction_buffers" must be a multiple of 16

Maybe we should resize it to the next multiple of the SLRU_BANK_SIZE
instead of giving an error?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#92Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#91)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-01, Dilip Kumar wrote:

On Thu, Feb 1, 2024 at 3:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

postgres -c lc_messages=C -c shared_buffers=$((512*17))

2024-02-01 10:48:13.548 CET [1535379] FATAL: invalid value for parameter "transaction_buffers": 17
2024-02-01 10:48:13.548 CET [1535379] DETAIL: "transaction_buffers" must be a multiple of 16

Maybe we should resize it to the next multiple of the SLRU_BANK_SIZE
instead of giving an error?

Since this is the auto-tuning feature, I think it should use the
previous multiple rather than the next, but yeah, something like that.

While I have your attention -- if you could give a look to the 0001
patch I posted, I would appreciate it.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia." (El principio Dilbert)

#93Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#92)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Feb 1, 2024 at 3:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-01, Dilip Kumar wrote:

On Thu, Feb 1, 2024 at 3:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

postgres -c lc_messages=C -c shared_buffers=$((512*17))

2024-02-01 10:48:13.548 CET [1535379] FATAL: invalid value for parameter "transaction_buffers": 17
2024-02-01 10:48:13.548 CET [1535379] DETAIL: "transaction_buffers" must be a multiple of 16

Maybe we should resize it to the next multiple of the SLRU_BANK_SIZE
instead of giving an error?

Since this is the auto-tuning feature, I think it should use the
previous multiple rather than the next, but yeah, something like that.

Okay.

While I have your attention -- if you could give a look to the 0001
patch I posted, I would appreciate it.

I will look into it. Thanks.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#94Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#93)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Feb 1, 2024 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Feb 1, 2024 at 3:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Okay.

While I have your attention -- if you could give a look to the 0001
patch I posted, I would appreciate it.

I will look into it. Thanks.

Some quick observations,

Do we need below two write barriers at the end of the function?
because the next instruction is separated by the function boundary

@@ -766,14 +766,11 @@ StartupCLOG(void)
  ..
- XactCtl->shared->latest_page_number = pageno;
-
- LWLockRelease(XactSLRULock);
+ pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
+ pg_write_barrier();
 }
/*
  * Initialize member's idea of the latest page number.
  */
  pageno = MXOffsetToMemberPage(offset);
- MultiXactMemberCtl->shared->latest_page_number = pageno;
+ pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+    pageno);
+
+ pg_write_barrier();
 }

I am looking more into this from the concurrency point of view and
will update you soon.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#95Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#94)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Thu, Feb 1, 2024 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Feb 1, 2024 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Feb 1, 2024 at 3:44 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Okay.

While I have your attention -- if you could give a look to the 0001
patch I posted, I would appreciate it.

I will look into it. Thanks.

Some quick observations,

Do we need below two write barriers at the end of the function?
because the next instruction is separated by the function boundary

@@ -766,14 +766,11 @@ StartupCLOG(void)
..
- XactCtl->shared->latest_page_number = pageno;
-
- LWLockRelease(XactSLRULock);
+ pg_atomic_init_u64(&XactCtl->shared->latest_page_number, pageno);
+ pg_write_barrier();
}
/*
* Initialize member's idea of the latest page number.
*/
pageno = MXOffsetToMemberPage(offset);
- MultiXactMemberCtl->shared->latest_page_number = pageno;
+ pg_atomic_init_u64(&MultiXactMemberCtl->shared->latest_page_number,
+    pageno);
+
+ pg_write_barrier();
}

I have checked the patch and it looks fine to me other than the above
question related to memory barrier usage one more question about the
same, basically below to instances 1 and 2 look similar but in 1 you
are not using the memory write_barrier whereas in 2 you are using the
write_barrier, why is it so? I mean why the reordering can not happen
in 1 and it may happen in 2?

1.
+ pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+ trunc->pageno);

SimpleLruTruncate(CommitTsCtl, trunc->pageno);

vs
2.

  - shared->latest_page_number = pageno;
+ pg_atomic_write_u64(&shared->latest_page_number, pageno);
+ pg_write_barrier();

/* update the stats counter of zeroed pages */
pgstat_count_slru_page_zeroed(shared->slru_stats_idx);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#96Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#95)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-02, Dilip Kumar wrote:

I have checked the patch and it looks fine to me other than the above
question related to memory barrier usage one more question about the
same, basically below to instances 1 and 2 look similar but in 1 you
are not using the memory write_barrier whereas in 2 you are using the
write_barrier, why is it so? I mean why the reordering can not happen
in 1 and it may happen in 2?

What I was thinking is that there's a lwlock operation just below, which
acts as a barrier. But I realized something more important: there are
only two places that matter, which are SlruSelectLRUPage and
SimpleLruZeroPage. The others are all initialization code that run at a
point where there's no going to be any concurrency in SLRU access, so we
don't need barriers anyway. In SlruSelectLRUPage we definitely don't
want to evict the page that SimpleLruZeroPage has initialized, starting
from the point where it returns that new page to its caller.

But if you consider the code of those two routines, you realize that the
only time an equality between latest_page_number and "this_page_number"
is going to occur, is when both pages are in the same bank ... and both
routines are required to be holding the bank lock while they run, so in
practice this is never a problem.

We need the atomic write and atomic read so that multiple processes
processing pages in different banks can update latest_page_number
simultaneously. But the equality condition that we're looking for?
it can never happen concurrently.

In other words, these barriers are fully useless.

(We also have SimpleLruTruncate, but I think it's not as critical to
have a barrier there anyhow: accessing a slightly outdated page number
could only be a problem if a bug elsewhere causes us to try to truncate
in the current page. I think we only have this code there because we
did have such bugs in the past, but IIUC this shouldn't happen anymore.)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#97Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#96)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

In short, I propose the attached.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachments:

v2-0001-Use-atomics-for-SlruSharedData-latest_page_number.patchtext/x-diff; charset=utf-8Download
From b4ba8135f8044e0077a27fcf6ad18451380cbcb3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 31 Jan 2024 12:27:51 +0100
Subject: [PATCH v2] Use atomics for SlruSharedData->latest_page_number

---
 src/backend/access/transam/clog.c      |  6 +---
 src/backend/access/transam/commit_ts.c |  7 ++---
 src/backend/access/transam/multixact.c | 28 +++++++++++-------
 src/backend/access/transam/slru.c      | 40 +++++++++++++++++++-------
 src/include/access/slru.h              |  5 +++-
 5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..f8aa91eb0a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -766,14 +766,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_write_u64(XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..6bfe60343e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -689,9 +689,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -1006,7 +1004,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..febc429f72 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2017,13 +2017,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 }
 
 /*
@@ -2047,14 +2049,15 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
+
+	/* Clean up offsets state */
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2081,14 +2084,16 @@ TrimMultiXact(void)
 
 	LWLockRelease(MultiXactOffsetSLRULock);
 
-	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
+	 * And the same for members.
+	 *
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
+
+	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -3333,7 +3338,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..c1d0dfc73b 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -17,7 +17,8 @@
  * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
  * must be held to examine or modify any shared state.  A process that is
  * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.
+ * only the per-buffer lock for the buffer it is working on.  One exception
+ * is latest_page_number, which is read and written using atomic ops.
  *
  * "Holding the control lock" means exclusive lock in all cases except for
  * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
@@ -239,8 +240,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		shared->lsn_groups_per_page = nlsns;
 
 		shared->cur_lru_count = 0;
-
-		/* shared->latest_page_number will be set later */
+		pg_atomic_write_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
 
@@ -329,8 +329,16 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	/* Set the LSNs for this new page to zero */
 	SimpleLruZeroLSNs(ctl, slotno);
 
-	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	/*
+	 * Assume this page is now the latest active page.
+	 *
+	 * Note that because both this routine and SlruSelectLRUPage run with
+	 * a bank lock held, it is not possible for this to be zeroing a page
+	 * that SlruSelectLRUPage is going to evict simultaneously -- they would
+	 * both have to hold the same bank lock!  Therefore, there's no memory
+	 * barrier here.
+	 */
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -1113,9 +1121,17 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				shared->page_lru_count[slotno] = cur_count;
 				this_delta = 0;
 			}
+
+			/*
+			 * If this page is the one most recently zeroed, don't consider it
+			 * an eviction candidate. See comments in SimpleLruZeroPage for an
+			 * explanation about the lack of a memory barrier here.
+			 */
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
+
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
 				if (this_delta > best_valid_delta ||
@@ -1254,7 +1270,6 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
-	int			slotno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1270,10 +1285,13 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 restart:
 
 	/*
-	 * While we are holding the lock, make an important safety check: the
-	 * current endpoint page must not be eligible for removal.
+	 * An important safety check: the current endpoint page must not be
+	 * eligible for removal.  Like SlruSelectLRUPage, we don't need a
+	 * memory barrier here because for the affected page to be relevant,
+	 * we'd have to have the same bank lock as SimpleLruZeroPage.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
 		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
@@ -1282,7 +1300,7 @@ restart:
 		return;
 	}
 
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..2109488654 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -49,6 +49,9 @@ typedef enum
 
 /*
  * Shared-memory state
+ *
+ * ControlLock is used to protect access to the other fields, except
+ * latest_page_number, which uses atomics; see comment in slru.c.
  */
 typedef struct SlruSharedData
 {
@@ -95,7 +98,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
-- 
2.39.2

#98Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#97)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Sorry, brown paper bag bug there. Here's the correct one.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

Attachments:

v3-0001-Use-atomics-for-SlruSharedData-latest_page_number.patchtext/x-diff; charset=utf-8Download
From 99cadfdf7475146953e9846c20c4a708a3527937 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 31 Jan 2024 12:27:51 +0100
Subject: [PATCH v3] Use atomics for SlruSharedData->latest_page_number

---
 src/backend/access/transam/clog.c      |  6 +---
 src/backend/access/transam/commit_ts.c |  7 ++---
 src/backend/access/transam/multixact.c | 28 +++++++++++-------
 src/backend/access/transam/slru.c      | 40 +++++++++++++++++++-------
 src/include/access/slru.h              |  5 +++-
 5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..06fc2989ba 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -766,14 +766,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_write_u64(&XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..6bfe60343e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -689,9 +689,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -1006,7 +1004,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(&CommitTsCtl->shared->latest_page_number,
+							trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..febc429f72 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2017,13 +2017,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
 }
 
 /*
@@ -2047,14 +2049,15 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+						pageno);
+
+	/* Clean up offsets state */
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2081,14 +2084,16 @@ TrimMultiXact(void)
 
 	LWLockRelease(MultiXactOffsetSLRULock);
 
-	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
+	 * And the same for members.
+	 *
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
+						pageno);
+
+	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -3333,7 +3338,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..c1d0dfc73b 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -17,7 +17,8 @@
  * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
  * must be held to examine or modify any shared state.  A process that is
  * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.
+ * only the per-buffer lock for the buffer it is working on.  One exception
+ * is latest_page_number, which is read and written using atomic ops.
  *
  * "Holding the control lock" means exclusive lock in all cases except for
  * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
@@ -239,8 +240,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		shared->lsn_groups_per_page = nlsns;
 
 		shared->cur_lru_count = 0;
-
-		/* shared->latest_page_number will be set later */
+		pg_atomic_write_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
 
@@ -329,8 +329,16 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	/* Set the LSNs for this new page to zero */
 	SimpleLruZeroLSNs(ctl, slotno);
 
-	/* Assume this page is now the latest active page */
-	shared->latest_page_number = pageno;
+	/*
+	 * Assume this page is now the latest active page.
+	 *
+	 * Note that because both this routine and SlruSelectLRUPage run with
+	 * a bank lock held, it is not possible for this to be zeroing a page
+	 * that SlruSelectLRUPage is going to evict simultaneously -- they would
+	 * both have to hold the same bank lock!  Therefore, there's no memory
+	 * barrier here.
+	 */
+	pg_atomic_write_u64(&shared->latest_page_number, pageno);
 
 	/* update the stats counter of zeroed pages */
 	pgstat_count_slru_page_zeroed(shared->slru_stats_idx);
@@ -1113,9 +1121,17 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 				shared->page_lru_count[slotno] = cur_count;
 				this_delta = 0;
 			}
+
+			/*
+			 * If this page is the one most recently zeroed, don't consider it
+			 * an eviction candidate. See comments in SimpleLruZeroPage for an
+			 * explanation about the lack of a memory barrier here.
+			 */
 			this_page_number = shared->page_number[slotno];
-			if (this_page_number == shared->latest_page_number)
+			if (this_page_number ==
+				pg_atomic_read_u64(&shared->latest_page_number))
 				continue;
+
 			if (shared->page_status[slotno] == SLRU_PAGE_VALID)
 			{
 				if (this_delta > best_valid_delta ||
@@ -1254,7 +1270,6 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
-	int			slotno;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1270,10 +1285,13 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 restart:
 
 	/*
-	 * While we are holding the lock, make an important safety check: the
-	 * current endpoint page must not be eligible for removal.
+	 * An important safety check: the current endpoint page must not be
+	 * eligible for removal.  Like SlruSelectLRUPage, we don't need a
+	 * memory barrier here because for the affected page to be relevant,
+	 * we'd have to have the same bank lock as SimpleLruZeroPage.
 	 */
-	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+						  cutoffPage))
 	{
 		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
@@ -1282,7 +1300,7 @@ restart:
 		return;
 	}
 
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b05f6bc71d..2109488654 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -49,6 +49,9 @@ typedef enum
 
 /*
  * Shared-memory state
+ *
+ * ControlLock is used to protect access to the other fields, except
+ * latest_page_number, which uses atomics; see comment in slru.c.
  */
 typedef struct SlruSharedData
 {
@@ -95,7 +98,7 @@ typedef struct SlruSharedData
 	 * this is not critical data, since we use it only to avoid swapping out
 	 * the latest page.
 	 */
-	int64		latest_page_number;
+	pg_atomic_uint64 latest_page_number;
 
 	/* SLRU's index for statistics purposes (might not be unique) */
 	int			slru_stats_idx;
-- 
2.39.2

#99Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#96)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 4 Feb 2024, at 18:38, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

In other words, these barriers are fully useless.

+1. I've tried to understand ideas behind barriers, but latest_page_number is heuristics that does not need any guarantees at all. It's also is used in safety check which can fire only when everything is already broken beyond any repair.. (Though using atomic access seems a good idea anyway)

This patch uses wording "banks" in comments before banks start to exist. But as far as I understand, it is expected to be committed before "banks" patch.

Besides this patch looks good to me.

Best regards, Andrey Borodin.

#100Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#96)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Sun, Feb 4, 2024 at 7:10 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-02, Dilip Kumar wrote:

I have checked the patch and it looks fine to me other than the above
question related to memory barrier usage one more question about the
same, basically below to instances 1 and 2 look similar but in 1 you
are not using the memory write_barrier whereas in 2 you are using the
write_barrier, why is it so? I mean why the reordering can not happen
in 1 and it may happen in 2?

What I was thinking is that there's a lwlock operation just below, which
acts as a barrier. But I realized something more important: there are
only two places that matter, which are SlruSelectLRUPage and
SimpleLruZeroPage. The others are all initialization code that run at a
point where there's no going to be any concurrency in SLRU access, so we
don't need barriers anyway. In SlruSelectLRUPage we definitely don't
want to evict the page that SimpleLruZeroPage has initialized, starting
from the point where it returns that new page to its caller.
But if you consider the code of those two routines, you realize that the
only time an equality between latest_page_number and "this_page_number"
is going to occur, is when both pages are in the same bank ... and both
routines are required to be holding the bank lock while they run, so in
practice this is never a problem.

Right, in fact when I first converted this 'latest_page_number' to an
atomic the thinking was to protect it from concurrently setting the
values in SimpleLruZeroPage() and also concurrently reading in
SlruSelectLRUPage() should not read the corrupted value. All other
usages were during the initialization phase where we do not need any
protection.

We need the atomic write and atomic read so that multiple processes
processing pages in different banks can update latest_page_number
simultaneously. But the equality condition that we're looking for?
it can never happen concurrently.

Yeah, that's right, after you told I also realized that the case is
protected by the bank lock. Earlier I didn't think about this case.

In other words, these barriers are fully useless.

(We also have SimpleLruTruncate, but I think it's not as critical to
have a barrier there anyhow: accessing a slightly outdated page number
could only be a problem if a bug elsewhere causes us to try to truncate
in the current page. I think we only have this code there because we
did have such bugs in the past, but IIUC this shouldn't happen anymore.)

+1, I agree with this theory in general. But the below comment in
SimpleLruTrucate in your v3 patch doesn't seem correct, because here
we are checking if the latest_page_number is smaller than the cutoff
if so we log it as wraparound and skip the whole thing and that is
fine even if we are reading with atomic variable and slightly outdated
value should not be a problem but the comment claim that this safe
because we have the same bank lock as SimpleLruZeroPage(), but that's
not true here we will be acquiring different bank locks one by one
based on which slotno we are checking. Am I missing something?

+ * An important safety check: the current endpoint page must not be
+ * eligible for removal.  Like SlruSelectLRUPage, we don't need a
+ * memory barrier here because for the affected page to be relevant,
+ * we'd have to have the same bank lock as SimpleLruZeroPage.
  */
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
+   cutoffPage))

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#101Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#100)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-04, Andrey M. Borodin wrote:

This patch uses wording "banks" in comments before banks start to
exist. But as far as I understand, it is expected to be committed
before "banks" patch.

True -- changed that to use ControlLock.

Besides this patch looks good to me.

Many thanks for reviewing.

On 2024-Feb-05, Dilip Kumar wrote:

(We also have SimpleLruTruncate, but I think it's not as critical to
have a barrier there anyhow: accessing a slightly outdated page number
could only be a problem if a bug elsewhere causes us to try to truncate
in the current page. I think we only have this code there because we
did have such bugs in the past, but IIUC this shouldn't happen anymore.)

+1, I agree with this theory in general. But the below comment in
SimpleLruTrucate in your v3 patch doesn't seem correct, because here
we are checking if the latest_page_number is smaller than the cutoff
if so we log it as wraparound and skip the whole thing and that is
fine even if we are reading with atomic variable and slightly outdated
value should not be a problem but the comment claim that this safe
because we have the same bank lock as SimpleLruZeroPage(), but that's
not true here we will be acquiring different bank locks one by one
based on which slotno we are checking. Am I missing something?

I think you're correct. I reworded this comment, so now it says this:

/*
* An important safety check: the current endpoint page must not be
* eligible for removal. This check is just a backstop against wraparound
* bugs elsewhere in SLRU handling, so we don't care if we read a slightly
* outdated value; therefore we don't add a memory barrier.
*/

Pushed with those changes. Thank you!

Now I'll go rebase the rest of the patch on top.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

#102Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#101)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Feb 6, 2024 at 4:23 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

(We also have SimpleLruTruncate, but I think it's not as critical to
have a barrier there anyhow: accessing a slightly outdated page number
could only be a problem if a bug elsewhere causes us to try to truncate
in the current page. I think we only have this code there because we
did have such bugs in the past, but IIUC this shouldn't happen anymore.)

+1, I agree with this theory in general. But the below comment in
SimpleLruTrucate in your v3 patch doesn't seem correct, because here
we are checking if the latest_page_number is smaller than the cutoff
if so we log it as wraparound and skip the whole thing and that is
fine even if we are reading with atomic variable and slightly outdated
value should not be a problem but the comment claim that this safe
because we have the same bank lock as SimpleLruZeroPage(), but that's
not true here we will be acquiring different bank locks one by one
based on which slotno we are checking. Am I missing something?

I think you're correct. I reworded this comment, so now it says this:

/*
* An important safety check: the current endpoint page must not be
* eligible for removal. This check is just a backstop against wraparound
* bugs elsewhere in SLRU handling, so we don't care if we read a slightly
* outdated value; therefore we don't add a memory barrier.
*/

Pushed with those changes. Thank you!

Yeah, this looks perfect, thanks.

Now I'll go rebase the rest of the patch on top.

Okay, I will review and test after that.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#103Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#102)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Here's the rest of it rebased on top of current master. I think it
makes sense to have this as one individual commit.

I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
compute a number that's multiple of SLRU_BANK_SIZE. But it's a crock,
because we don't have that macro at that point, so I just used constant
16. Obviously need a better solution for this.

I also changed the location of bank_mask in SlruCtlData for better
packing, as advised by pahole; and renamed SLRU_SLOTNO_GET_BANKLOCKNO()
to SlotGetBankNumber().

Some very critical comments still need to be updated to the new design,
particularly anything that mentions "control lock"; but also the overall
model needs to be explained in some central location, rather than
incongruently some pieces here and other pieces there. I'll see about
this later. But at least this is code you should be able to play with.

I've been wondering whether we should add a "slru" to the name of the
GUCs:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Attachments:

v18-enlarge-slru-buffers.patchtext/x-diff; charset=utf-8Download
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 06fc2989ba..27bf318564 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -43,6 +43,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +63,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,15 +294,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
-	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * When there is contention on the SLRU bank lock we need, we try to group
+	 * multiple updates; a single leader process will perform transaction
+	 * status updates for multiple backends so that the number of times the
+	 * bank lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -310,17 +325,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +348,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +370,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +422,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * Subroutine for TransactionIdSetPageStatus, q.v.
+ *
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -427,13 +444,15 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	PGPROC	   *proc = MyProc;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -441,6 +460,41 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * The underlying SLRU is using bank-wise lock so it is possible that here
+	 * we might get requesters who are contending on different SLRU-bank
+	 * locks. But in the group, we try to only add the requesters who want to
+	 * update the same page i.e. they would be requesting for the same
+	 * SLRU-bank lock as well.  The main reason for now allowing requesters of
+	 * different pages together is 1) Once the leader acquires the lock they
+	 * don't need to fetch multiple pages and do multiple I/O under the same
+	 * lock 2) The leader need not switch the SLRU-bank lock if the different
+	 * pages are from different SLRU banks 3) And the most important reason is
+	 * that most of the time the contention will occur in high concurrent OLTP
+	 * workload is going on and at that time most of the transactions would be
+	 * generated during the same time and most of them would fall in same clog
+	 * page as each page can hold status of 32k transactions.  However, there
+	 * is an exception where in some extreme conditions we might get different
+	 * page requests added in the same group but we have handled that by
+	 * switching the bank lock, although that is not the most performant way
+	 * that's not the common case either so we are fine with that.
+	 *
+	 * Also to be noted that unless the leader of the current group does not
+	 * get the lock we don't clear the 'procglobal->clogGroupFirst' that means
+	 * concurrently if we get the requesters for different SLRU pages then
+	 * those will have to go for the normal update instead of group update and
+	 * that's fine as that is not the common case.  As soon as the leader of
+	 * the current group gets the lock for the required bank that time we
+	 * clear this value and now other requesters (which might want to update a
+	 * different page and that might fall into the different bank as well) are
+	 * allowed to form a new group as the first group is now detached.  So if
+	 * the new group has a request for a different SLRU-bank lock then the
+	 * group leader of this group might also get the lock while the first
+	 * group is performing the update and these two groups can perform the
+	 * group update concurrently but it is completely safe as these two
+	 * leaders are operating on completely different SLRU pages and they both
+	 * are holding their respective SLRU locks.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -507,8 +561,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock for the first page in the group before we
+	 * close this group by setting procglobal->clogGroupFirst as
+	 * INVALID_PGPROCNO so that we do not close the new entries to the group
+	 * even before getting the lock and losing whole purpose of the group
+	 * update.
+	 */
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+	prevpageno = ProcGlobal->allProcs[nextidx].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
@@ -525,6 +588,37 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the SLRU bank lock for the current page is not the same as that
+		 * of the last page then we need to release the lock on the previous
+		 * bank and acquire the lock on the bank for the page we are going to
+		 * update now.
+		 *
+		 * Although on the best effort basis we try that all the requests
+		 * within a group are for the same clog page there are some
+		 * possibilities that there are request for more than one page in the
+		 * same group (for details refer to the comment in the previous while
+		 * loop).  That scenario might not be very performant because while
+		 * switching the lock the group leader might need to wait on the new
+		 * lock if the pages are from different SLRU bank but it is safe
+		 * because a) we are releasing the old lock before acquiring the new
+		 * lock so there is should not be any deadlock situation b) and, we
+		 * are always modifying the page under the correct SLRU lock.
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -544,7 +638,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
@@ -573,7 +668,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -584,6 +679,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -673,23 +773,19 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -704,13 +800,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -721,8 +840,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -731,7 +851,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -780,8 +900,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -813,7 +934,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -845,6 +966,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -855,13 +977,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -999,16 +1122,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 6bfe60343e..58e05dc0b9 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,22 +240,25 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
 						 RepOriginId nodeid, int slotno)
 {
-	int			entryno = TransactionIdToCTsEntry(xid);
+	int			entryno;
 	CommitTimestampEntry entry;
 
-	Assert(TransactionIdIsNormal(xid));
+	if (!TransactionIdIsNormal(xid))
+		return;
+
+	entryno = TransactionIdToCTsEntry(xid);
 
 	entry.time = ts;
 	entry.nodeid = nodeid;
@@ -345,7 +350,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +504,19 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +538,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +577,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -715,13 +748,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -760,8 +794,6 @@ DeactivateCommitTs(void)
 	TransamVariables->oldestCommitTsXid = InvalidTransactionId;
 	TransamVariables->newestCommitTsXid = InvalidTransactionId;
 
-	LWLockRelease(CommitTsLock);
-
 	/*
 	 * Remove *all* files.  This is necessary so that there are no leftover
 	 * files; in the case where this feature is later enabled after running
@@ -769,10 +801,16 @@ DeactivateCommitTs(void)
 	 * (We can probably tolerate out-of-sequence files, as they are going to
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
+	 *
+	 * Note that we do this with CommitTsLock acquired in exclusive mode.
+	 * This is very heavy-handed, but since this routine can only be called
+	 * in the replica and should happen very rarely, we don't worry too much
+	 * about it.  Note also that no process should be consulting this SLRU
+	 * if we have just deactivated it.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+
+	LWLockRelease(CommitTsLock);
 }
 
 /*
@@ -804,6 +842,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -824,12 +863,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -983,16 +1024,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index febc429f72..311fdb2b21 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2049,6 +2130,8 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
+	/* Clean up offsets state */
+
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
@@ -2056,9 +2139,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
 						pageno);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
 	 * TrimCLOG() for background.  Unlike CLOG, some WAL record covers every
@@ -2072,7 +2152,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2080,10 +2162,9 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/*
 	 * And the same for members.
 	 *
@@ -2093,8 +2174,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
 						pageno);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
 	 * TrimCLOG() for motivation.
@@ -2105,7 +2184,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2120,10 +2201,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2411,6 +2491,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2421,13 +2502,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2460,15 +2542,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2766,7 +2850,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3248,31 +3332,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index e1c468861f..5b2b6e46db 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -60,6 +60,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -106,6 +107,23 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+
+/*
+ * Bank size for the slot array.  Pages are assigned a bank according to their
+ * page number, with each bank being this size.  We want a power of 2 so that
+ * we can determine the bank number for a page with just bit shifting; we also
+ * want to keep the bank size small so that LRU victim search is fast.  16
+ * buffers per bank seems a good number.
+ */
+#define SLRU_BANK_BITSHIFT		4
+#define SLRU_BANK_SIZE			(1 << SLRU_BANK_BITSHIFT)
+
+/*
+ * Macro to get the bank number to which the slot belongs.
+ */
+#define SlotGetBankNumber(slotno)	((slotno) >> SLRU_BANK_BITSHIFT)
+
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -118,34 +136,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +163,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,8 +173,11 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
 Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
+	int			nbanks = nslots / SLRU_BANK_SIZE;
 	Size		sz;
 
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
 	sz += MAXALIGN(nslots * sizeof(char *));	/* page_buffer[] */
@@ -192,6 +186,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +204,20 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -228,18 +228,14 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize locks and shared memory area */
 		char	   *ptr;
 		Size		offset;
-		int			slotno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
 		pg_atomic_init_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -260,6 +256,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -268,10 +268,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 
 		ptr += BUFFERALIGN(offset);
-		for (slotno = 0; slotno < nslots; slotno++)
+		for (int slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -280,11 +280,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the slot banks. */
+		for (int bankno = 0; bankno < nbanks; bankno++)
+		{
+			LWLockInitialize(&shared->bank_locks[bankno].lock, bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,6 +303,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -376,12 +387,13 @@ static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = SlotGetBankNumber(slotno);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -424,7 +436,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -432,10 +444,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	/* Caller must hold the bank lock for the input page. */
+	Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -478,9 +494,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = SlotGetBankNumber(slotno);
 
-		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		/* Release bank lock while doing I/O */
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -489,7 +506,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -531,12 +548,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -553,8 +577,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -575,6 +599,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 {
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
+	int			bankno = SlotGetBankNumber(slotno);
 	bool		ok;
 
 	/* If a write is in progress, wait for it to finish */
@@ -604,7 +629,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
 	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -619,7 +644,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	}
 
 	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -1035,17 +1060,17 @@ SlruReportIOError(SlruCtl ctl, int64 pageno, TransactionId xid)
 }
 
 /*
- * Select the slot to re-use when we need a free slot.
+ * Select the slot to re-use when we need a free slot for the given page.
  *
- * The target page number is passed because we need to consider the
- * possibility that some other process reads in the target page while
- * we are doing I/O to free a slot.  Hence, check or recheck to see if
- * any slot already holds the target page, and return that slot if so.
- * Thus, the returned slot is *either* a slot already holding the pageno
- * (could be any state except EMPTY), *or* a freeable slot (state EMPTY
- * or CLEAN).
+ * The target page number is passed not only because we need to know the
+ * correct bank to use, but also because we need to consider the possibility
+ * that some other process reads in the target page while we are doing I/O to
+ * free a slot.  Hence, check or recheck to see if any slot already holds the
+ * target page, and return that slot if so.  Thus, the returned slot is
+ * *either* a slot already holding the pageno (could be any state except
+ * EMPTY), *or* a freeable slot (state EMPTY or CLEAN).
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 static int
 SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
@@ -1063,9 +1088,18 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
+
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1099,8 +1133,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1203,6 +1237,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1213,10 +1248,23 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1230,7 +1278,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1269,6 +1317,7 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1279,8 +1328,6 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
@@ -1292,15 +1339,29 @@ restart:
 	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
 						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbank = SlotGetBankNumber(0);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1391,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbank].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1372,17 +1435,31 @@ void
 SlruDeleteSegment(SlruCtl ctl, int64 segno)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank = SlotGetBankNumber(0);
 	int			slotno;
 	bool		did_write;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbank = SlotGetBankNumber(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1493,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 }
 
 /*
@@ -1683,3 +1760,50 @@ SlruSyncFileTag(SlruCtl ctl, const FileTag *ftag, char *path)
 	errno = save_errno;
 	return result;
 }
+
+/*
+ * Function to mark a buffer slot "most recently used".
+ *
+ * The reason for the if-test is that there are often many consecutive
+ * accesses to the same page (particularly the latest page).  By suppressing
+ * useless increments of bank_cur_lru_count, we reduce the probability that old
+ * pages' counts will "wrap around" and make them appear recently used.
+ *
+ * We allow this code to be executed concurrently by multiple processes within
+ * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
+ * this should not cause any completely-bogus values to enter the computation.
+ * However, it is possible for either bank_cur_lru_count or individual
+ * page_lru_count entries to be "reset" to lower values than they should have,
+ * in case a process is delayed while it executes this function.  With care in
+ * SlruSelectLRUPage(), this does little harm, and in any case the absolute
+ * worst possible consequence is a nonoptimal choice of page to evict.  The
+ * gain from allowing concurrent reads of SLRU pages seems worth it.
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = SlotGetBankNumber(slotno);
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..bbc0aecc99 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,23 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +214,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +264,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +275,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +305,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +314,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +388,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +400,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 71677cf031..79778b5813 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
@@ -776,7 +783,7 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
  * in mode.
  *
  * This function will not block waiting for a lock to become free - that's the
- * callers job.
+ * caller's job.
  *
  * Returns true if the lock isn't free and we need to wait.
  */
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 3d59d3646e..284d168f77 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+# 31 was SerialSLRULock
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index eed63a05ed..1fe7e8c383 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -213,6 +213,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -813,9 +814,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -841,6 +842,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -854,15 +864,17 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
 	/*
-	 * In this routine, we must hold both SerialControlLock and SerialSLRULock
-	 * simultaneously while making the SLRU data catch up with the new state
-	 * that we determine.
+	 * In this routine, we must hold both SerialControlLock and the SLRU
+	 * bank lock simultaneously while making the SLRU data catch up with
+	 * the new state that we determine.
 	 */
 	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
@@ -898,7 +910,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (isNewPage)
 	{
@@ -916,7 +928,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 	LWLockRelease(SerialControlLock);
 }
 
@@ -950,13 +962,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -1367,7 +1379,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 6464386b77..5188b60709 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -293,11 +293,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -308,19 +304,16 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -373,6 +366,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 88b03e8fa3..7df342c70d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -156,3 +156,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7fe58518d7..502fd51939 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2320,6 +2321,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index da10b43dac..8b3a547a5e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2109488654..5e5b8339e9 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -55,8 +60,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -69,8 +72,30 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
@@ -78,21 +103,12 @@ typedef struct SlruSharedData
 	 * has lsn_groups_per_page entries per buffer slot, each containing the
 	 * highest LSN known for a contiguous group of SLRU entries on that slot's
 	 * page.
+	 *
+	 * XXX could we make the LSNs to be bank-based?
 	 */
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
@@ -114,6 +130,19 @@ typedef struct SlruCtlData
 {
 	SlruShared	shared;
 
+	/*
+	 * Bitmask to determine bank number from page number.
+	 */
+	bits16		bank_mask;
+
+	/*
+	 * If true, use long segment filenames formed from lower 48 bits of the
+	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
+	 * filenames formed from lower 16 bits of the segment number e.g.
+	 * pg_xact/1234.
+	 */
+	bool		long_segment_names;
+
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
@@ -132,28 +161,35 @@ typedef struct SlruCtlData
 	 */
 	bool		(*PagePrecedes) (int64, int64);
 
-	/*
-	 * If true, use long segment filenames formed from lower 48 bits of the
-	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
-	 * filenames formed from lower 16 bits of the segment number e.g.
-	 * pg_xact/1234.
-	 */
-	bool		long_segment_names;
-
 	/*
 	 * Dir is set during SimpleLruInit and does not change thereafter. Since
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			bankno;
+
+	bankno = pageno & ctl->bank_mask;
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -182,5 +218,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 0b01c1f093..39b8ed9425 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -178,6 +178,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 5300c44f3b..44b0cbf9a1 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
#104Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#103)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Feb 6, 2024 at 8:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Here's the rest of it rebased on top of current master. I think it
makes sense to have this as one individual commit.

I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
compute a number that's multiple of SLRU_BANK_SIZE. But it's a crock,
because we don't have that macro at that point, so I just used constant
16. Obviously need a better solution for this.

If we define SLRU_BANK_SIZE in slur.h then we can use it here right,
because these files are anyway include slur.h so.

I also changed the location of bank_mask in SlruCtlData for better
packing, as advised by pahole; and renamed SLRU_SLOTNO_GET_BANKLOCKNO()
to SlotGetBankNumber().

Okay.

Some very critical comments still need to be updated to the new design,
particularly anything that mentions "control lock"; but also the overall
model needs to be explained in some central location, rather than
incongruently some pieces here and other pieces there. I'll see about
this later. But at least this is code you should be able to play with.

Okay, I will review and test this

I've been wondering whether we should add a "slru" to the name of the
GUCs:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

I am not sure we are exposing anything related to SLRU to the user, I
mean transaction_buffers should make sense for the user that it stores
transaction-related data in some buffers pool but whether that buffer
pool is called SLRU or not doesn't matter much to the user IMHO.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#105Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#104)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 7 Feb 2024, at 10:58, Dilip Kumar <dilipbalaut@gmail.com> wrote:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

I am not sure we are exposing anything related to SLRU to the user,

I think we already tell something about SLRU to the user. I’d rather consider if “transaction_slru_buffers" is easier to understand than “transaction_buffers” ..
IMO transaction_buffers is clearer. But I do not have strong opinion.

I
mean transaction_buffers should make sense for the user that it stores
transaction-related data in some buffers pool but whether that buffer
pool is called SLRU or not doesn't matter much to the user IMHO.

+1

Best regards, Andrey Borodin.

#106Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#104)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-07, Dilip Kumar wrote:

On Tue, Feb 6, 2024 at 8:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
compute a number that's multiple of SLRU_BANK_SIZE. But it's a crock,
because we don't have that macro at that point, so I just used constant
16. Obviously need a better solution for this.

If we define SLRU_BANK_SIZE in slur.h then we can use it here right,
because these files are anyway include slur.h so.

Sure, but is that really what we want?

I've been wondering whether we should add a "slru" to the name of the
GUCs:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

I am not sure we are exposing anything related to SLRU to the user,

We do -- we have pg_stat_slru already.

I mean transaction_buffers should make sense for the user that it
stores transaction-related data in some buffers pool but whether that
buffer pool is called SLRU or not doesn't matter much to the user
IMHO.

Yeah, that's exactly what my initial argument was for naming these this
way. But since the term slru already escaped into the wild via the
pg_stat_slru view, perhaps it helps users make the connection between
these things. Alternatively, we can cross-reference each term from the
other's documentation and call it a day.

Another painful point is that pg_stat_slru uses internal names in the
data it outputs, which obviously do not match the new GUCs.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

#107Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#106)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-07, Dilip Kumar wrote:

On Tue, Feb 6, 2024 at 8:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I made CLOGShmemBuffers, CommitTsShmemBuffers and SUBTRANSShmemBuffers
compute a number that's multiple of SLRU_BANK_SIZE. But it's a crock,
because we don't have that macro at that point, so I just used constant
16. Obviously need a better solution for this.

If we define SLRU_BANK_SIZE in slur.h then we can use it here right,
because these files are anyway include slur.h so.

Sure, but is that really what we want?

So your question is do we want these buffers to be in multiple of
SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I
don't think it should create any problem logically. I mean we can
look again in the patch to see if we have made any such assumptions
but that should be fairly easy to fix, then maybe if we are going in
this way we should get rid of the check_slru_buffers() function as
well.

I've been wondering whether we should add a "slru" to the name of the
GUCs:

commit_timestamp_slru_buffers
transaction_slru_buffers
etc

I am not sure we are exposing anything related to SLRU to the user,

We do -- we have pg_stat_slru already.

I mean transaction_buffers should make sense for the user that it
stores transaction-related data in some buffers pool but whether that
buffer pool is called SLRU or not doesn't matter much to the user
IMHO.

Yeah, that's exactly what my initial argument was for naming these this
way. But since the term slru already escaped into the wild via the
pg_stat_slru view, perhaps it helps users make the connection between
these things. Alternatively, we can cross-reference each term from the
other's documentation and call it a day.

Yeah, that's true I forgot this point about the pg_stat_slru, from
this pov if the configuration has the name slru they would be able to
make a better connection with the configured value, and the stats in
this view based on that they can take call if they need to somehow
increase the size of these slru buffers.

Another painful point is that pg_stat_slru uses internal names in the
data it outputs, which obviously do not match the new GUCs.

Yeah, that's true, but I think this could be explained somewhere not
sure what is the right place for this.

FYI, I have also repeated all the performance tests I performed in my
first email[1]/messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com, and I am seeing a similar gain.

Some comments on v18 in my first pass of the review.

1.
@@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
lsnindex = GetLSNIndex(slotno, xid);
*lsn = XactCtl->shared->group_lsn[lsnindex];

- LWLockRelease(XactSLRULock);
+ LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));

Maybe here we can add an assert before releasing the lock for a safety check

Assert(LWLockHeldByMe(SimpleLruGetBankLock(XactCtl, pageno)));

2.
+ *
+ * XXX could we make the LSNs to be bank-based?
  */
  XLogRecPtr *group_lsn;

IMHO, the flush still happens at the page level so up to which LSN
should be flush before flushing the particular clog page should also
be maintained at the page level.

[1]: /messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#108Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#107)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-07, Dilip Kumar wrote:

On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Sure, but is that really what we want?

So your question is do we want these buffers to be in multiple of
SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I
don't think it should create any problem logically. I mean we can
look again in the patch to see if we have made any such assumptions
but that should be fairly easy to fix, then maybe if we are going in
this way we should get rid of the check_slru_buffers() function as
well.

Not really, I just don't think the macro should be in slru.h.

Another thing I've been thinking is that perhaps it would be useful to
make the banks smaller, when the total number of buffers is small. For
example, if you have 16 or 32 buffers, it's not really clear to me that
it makes sense to have just 1 bank or 2 banks. It might be more
sensible to have 4 banks with 4 or 8 buffers instead. That should make
the algorithm scale down as well as up ...

I haven't done either of those things in the attached v19 version. I
did go over the comments once again and rewrote the parts I was unhappy
with, including some existing ones. I think it's OK now from that point
of view ... at some point I thought about creating a separate README,
but in the end I thought it not necessary.

I did add a bunch of Assert()s to make sure the locks that are supposed
to be held are actually held. This led me to testing the page status to
be not EMPTY during SimpleLruWriteAll() before calling
SlruInternalWritePage(), because the assert was firing. The previous
code is not really *buggy*, but to me it's weird to call WritePage() on
a slot with no contents.

Another change was in TransactionGroupUpdateXidStatus: the original code
had the leader doing pg_atomic_read_u32(&procglobal->clogGroupFirst) to
know which bank to lock. I changed it to simply be the page used by the
leader process; this doesn't need an atomic read, and should be the same
page anyway. (If it isn't, it's no big deal). But what's more: even if
we do read ->clogGroupFirst at that point, there's no guarantee that
this is going to be exactly for the same process that ends up being the
first in the list, because since we have not set it to INVALID by the
time we grab the bank lock, it is quite possible for more processes to
add themselves to the list.

I realized all this while rewriting the comments in a way that would let
me understand what was going on ... so IMO the effort was worthwhile.

Anyway, what I send now should be pretty much final, modulo the
change to the check_slru_buffers routines and documentation additions to
match pg_stat_slru to the new GUC names.

Another painful point is that pg_stat_slru uses internal names in the
data it outputs, which obviously do not match the new GUCs.

Yeah, that's true, but I think this could be explained somewhere not
sure what is the right place for this.

Or we can change those names in the view.

FYI, I have also repeated all the performance tests I performed in my
first email[1], and I am seeing a similar gain.

Excellent, thanks for doing that.

Some comments on v18 in my first pass of the review.

1.
@@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
lsnindex = GetLSNIndex(slotno, xid);
*lsn = XactCtl->shared->group_lsn[lsnindex];

- LWLockRelease(XactSLRULock);
+ LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));

Maybe here we can add an assert before releasing the lock for a safety check

Assert(LWLockHeldByMe(SimpleLruGetBankLock(XactCtl, pageno)));

Hmm, I think this would just throw a warning or error "you don't hold
such lwlock", so it doesn't seem necessary.

2.
+ *
+ * XXX could we make the LSNs to be bank-based?
*/
XLogRecPtr *group_lsn;

IMHO, the flush still happens at the page level so up to which LSN
should be flush before flushing the particular clog page should also
be maintained at the page level.

Yeah, this was a misguided thought, nevermind.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

Attachments:

v19-0001-Make-SLRU-buffer-sizes-configurable.patchtext/x-diff; charset=utf-8Download
From e1aabcbf5ce1417decfe24f513e5cfe8b6de77f2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 22 Feb 2024 18:42:56 +0100
Subject: [PATCH v19] Make SLRU buffer sizes configurable

Also, divide the slot array in banks, so that the LRU algorithm can be
made more scalable.

Also remove the centralized control lock for even better scalability.

Authors: Dilip Kumar, Andrey Borodin
---
 doc/src/sgml/config.sgml                      | 139 +++++++
 src/backend/access/transam/clog.c             | 235 ++++++++----
 src/backend/access/transam/commit_ts.c        |  89 +++--
 src/backend/access/transam/multixact.c        | 190 +++++++---
 src/backend/access/transam/slru.c             | 345 +++++++++++++-----
 src/backend/access/transam/subtrans.c         | 104 +++++-
 src/backend/commands/async.c                  |  61 +++-
 src/backend/storage/lmgr/lwlock.c             |   9 +-
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  34 +-
 .../utils/activity/wait_event_names.txt       |  15 +-
 src/backend/utils/init/globals.c              |   9 +
 src/backend/utils/misc/guc_tables.c           |  78 ++++
 src/backend/utils/misc/postgresql.conf.sample |   9 +
 src/include/access/clog.h                     |   1 -
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |  85 +++--
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   8 +
 src/include/storage/lwlock.h                  |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |  11 +
 src/test/modules/test_slru/test_slru.c        |  35 +-
 25 files changed, 1143 insertions(+), 352 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ff184003fe..aea8c8d69c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 97f7434da3..f03eae05ec 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -3,12 +3,13 @@
  * clog.c
  *		PostgreSQL transaction-commit-log manager
  *
- * This module replaces the old "pg_log" access code, which treated pg_log
- * essentially like a relation, in that it went through the regular buffer
- * manager.  The problem with that was that there wasn't any good way to
- * recycle storage space for transactions so old that they'll never be
- * looked up again.  Now we use specialized access code so that the commit
- * log can be broken into relatively small, independent segments.
+ * This module stores two bits per transaction regarding its commit/abort
+ * status; the status for four transactions fit in a byte.
+ *
+ * This would be a pretty simple abstraction on top of slru.c, except that
+ * for performance reasons we allow multiple transactions that are
+ * committing concurrently to form a queue, so that a single process can
+ * update the status for all of them within a single lock acquisition run.
  *
  * XLOG interactions: this module generates an XLOG record whenever a new
  * CLOG page is initialized to zeroes.  Other writes of CLOG come from
@@ -43,6 +44,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +64,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,15 +295,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
-	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * When there is contention on the SLRU bank lock we need, we try to group
+	 * multiple updates; a single leader process will perform transaction
+	 * status updates for multiple backends so that the number of times the
+	 * bank lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -310,17 +326,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +349,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +371,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +423,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * Subroutine for TransactionIdSetPageStatus, q.v.
+ *
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -425,16 +443,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 {
 	volatile PROC_HDR *procglobal = ProcGlobal;
 	PGPROC	   *proc = MyProc;
-	int			pgprocno = MyProcNumber;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -442,6 +461,29 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * We put ourselves in the queue by writing MyProcNumber to
+	 * ProcGlobal->clogGroupFirst.  However, if there's already a process
+	 * listed there, we compare our pageno with that of that process; if it
+	 * differs, we cannot participate in the group, so we return for caller to
+	 * update pg_xact in the normal way.
+	 *
+	 * If we're not the first process in the list, we must follow the leader.
+	 * We do this by storing the data we want updated in our PGPROC entry where
+	 * the leader can find it, then going to sleep.
+	 *
+	 * If no process is already in the list, we're the leader; our first step
+	 * is to "close out the group" by resetting the list pointer from
+	 * ProcGlobal->clogGroupFirst (this lets other processes set up other
+	 * groups later); then we lock the SLRU bank corresponding to our group's
+	 * page, do the SLRU updates, release the SLRU bank lock, and wake up the
+	 * sleeping processes.
+	 *
+	 * If another group starts to update a page in a different SLRU bank, they
+	 * can proceed concurrently, since the bank lock they're going to use is
+	 * different from ours.  If another group starts to update a page in the
+	 * same bank as ours, they wait until we release the lock.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -453,10 +495,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * There is a race condition here, which is that after doing the below
 		 * check and before adding this proc's clog update to a group, the
 		 * group leader might have already finished the group update for this
-		 * page and becomes group leader of another group. This will lead to a
-		 * situation where a single group can have different clog page
-		 * updates.  This isn't likely and will still work, just maybe a bit
-		 * less efficiently.
+		 * page and becomes group leader of another group, updating a different
+		 * page.  This will lead to a situation where a single group can have
+		 * different clog page updates.  This isn't likely and will still work,
+		 * just less efficiently -- we handle this case by switching to a
+		 * different bank lock in the loop below.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
 			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
@@ -474,7 +517,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
 										   &nextidx,
-										   (uint32) pgprocno))
+										   (uint32) MyProcNumber))
 			break;
 	}
 
@@ -508,13 +551,21 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * Acquire the SLRU bank lock that corresponds to the page we originally
+	 * wanted to modify.
+	 */
+	prevpageno = ProcGlobal->allProcs[MyProcNumber].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
 	 * group XID status update, saving a pointer to the head of the list.
 	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 *
+	 * At this point, any processes trying to do this would create a separate
+	 * group.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -526,6 +577,31 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the page to update belongs to a different bank than the previous
+		 * one, exchange bank lock to the new one.  This should be quite rare,
+		 * as described above.
+		 *
+		 * (We could try to optimize this by waking up the processes for which
+		 * we have already updated the status while we exchange the lock, but
+		 * the code doesn't do that at present.  I think it'd require
+		 * additional bookkeeping, making the common path slower in order to
+		 * improve an infrequent case.)
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,12 +621,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
 	 * don't do this under the lock so as to keep lock hold times to a
 	 * minimum.
+	 *
+	 * (Perhaps we could do this in two passes, the first setting clogGroupNext
+	 * to invalid while saving the semaphores to an array, then a single write
+	 * barrier, then another pass unlocking the semaphores.)
 	 */
 	while (wakeidx != INVALID_PGPROCNO)
 	{
@@ -574,7 +655,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -585,6 +666,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -666,7 +752,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -674,23 +760,19 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -705,13 +787,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -722,8 +827,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -732,7 +838,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -781,8 +887,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -814,7 +921,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -846,6 +953,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -856,13 +964,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1000,16 +1109,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 6bfe60343e..58e05dc0b9 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,22 +240,25 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
 						 RepOriginId nodeid, int slotno)
 {
-	int			entryno = TransactionIdToCTsEntry(xid);
+	int			entryno;
 	CommitTimestampEntry entry;
 
-	Assert(TransactionIdIsNormal(xid));
+	if (!TransactionIdIsNormal(xid))
+		return;
+
+	entryno = TransactionIdToCTsEntry(xid);
 
 	entry.time = ts;
 	entry.nodeid = nodeid;
@@ -345,7 +350,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +504,19 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +538,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +577,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -715,13 +748,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -760,8 +794,6 @@ DeactivateCommitTs(void)
 	TransamVariables->oldestCommitTsXid = InvalidTransactionId;
 	TransamVariables->newestCommitTsXid = InvalidTransactionId;
 
-	LWLockRelease(CommitTsLock);
-
 	/*
 	 * Remove *all* files.  This is necessary so that there are no leftover
 	 * files; in the case where this feature is later enabled after running
@@ -769,10 +801,16 @@ DeactivateCommitTs(void)
 	 * (We can probably tolerate out-of-sequence files, as they are going to
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
+	 *
+	 * Note that we do this with CommitTsLock acquired in exclusive mode.
+	 * This is very heavy-handed, but since this routine can only be called
+	 * in the replica and should happen very rarely, we don't worry too much
+	 * about it.  Note also that no process should be consulting this SLRU
+	 * if we have just deactivated it.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+
+	LWLockRelease(CommitTsLock);
 }
 
 /*
@@ -804,6 +842,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -824,12 +863,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -983,16 +1024,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index febc429f72..311fdb2b21 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2049,6 +2130,8 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
+	/* Clean up offsets state */
+
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
@@ -2056,9 +2139,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
 						pageno);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
 	 * TrimCLOG() for background.  Unlike CLOG, some WAL record covers every
@@ -2072,7 +2152,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2080,10 +2162,9 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/*
 	 * And the same for members.
 	 *
@@ -2093,8 +2174,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
 						pageno);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
 	 * TrimCLOG() for motivation.
@@ -2105,7 +2184,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2120,10 +2201,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2411,6 +2491,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2421,13 +2502,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2460,15 +2542,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2766,7 +2850,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3248,31 +3332,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index e1c468861f..b72ab48a70 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1,28 +1,38 @@
 /*-------------------------------------------------------------------------
  *
  * slru.c
- *		Simple LRU buffering for transaction status logfiles
+ *		Simple LRU buffering for wrap-around-able permanent metadata
  *
- * We use a simple least-recently-used scheme to manage a pool of page
- * buffers.  Under ordinary circumstances we expect that write
- * traffic will occur mostly to the latest page (and to the just-prior
- * page, soon after a page transition).  Read traffic will probably touch
- * a larger span of pages, but in any case a fairly small number of page
- * buffers should be sufficient.  So, we just search the buffers using plain
- * linear search; there's no need for a hashtable or anything fancy.
- * The management algorithm is straight LRU except that we will never swap
- * out the latest page (since we know it's going to be hit again eventually).
+ * This module is used to maintain various pieces of transaction status
+ * indexed by TransactionId (such as commit status, parent transaction ID,
+ * commit timestamp), as well as storage for multixacts, serializable
+ * isolation locks and NOTIFY traffic.  Extensions can define their own
+ * SLRUs, too.
  *
- * We use a control LWLock to protect the shared data structures, plus
- * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
- * must be held to examine or modify any shared state.  A process that is
- * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.  One exception
- * is latest_page_number, which is read and written using atomic ops.
+ * Under ordinary circumstances we expect that write traffic will occur
+ * mostly to the latest page (and to the just-prior page, soon after a
+ * page transition).  Read traffic will probably touch a larger span of
+ * pages, but a relatively small number of buffers should be sufficient.
  *
- * "Holding the control lock" means exclusive lock in all cases except for
- * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
- * the implications of that.
+ * We use a simple least-recently-used scheme to manage a pool of shared
+ * page buffers, split in banks by the lowest bits of the page number, and
+ * the management algorithm only processes the bank to which the desired
+ * page belongs, so a linear search is sufficient; there's no need for a
+ * hashtable or anything fancy.  The algorithm is straight LRU except that
+ * we will never swap out the latest page (since we know it's going to be
+ * hit again eventually).
+ *
+ * We use per-bank control LWLocks to protect the shared data structures,
+ * plus per-buffer LWLocks that synchronize I/O for each buffer.  The
+ * bank's control lock must be held to examine or modify any of the bank's
+ * shared state.  A process that is reading in or writing out a page
+ * buffer does not hold the control lock, only the per-buffer lock for the
+ * buffer it is working on.  One exception is latest_page_number, which is
+ * read and written using atomic ops.
+ *
+ * "Holding the bank control lock" means exclusive lock in all cases
+ * except for SimpleLruReadPage_ReadOnly(); see comments for
+ * SlruRecentlyUsed() for the implications of that.
  *
  * When initiating I/O on a buffer, we acquire the per-buffer lock exclusively
  * before releasing the control lock.  The per-buffer lock is released after
@@ -60,6 +70,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -106,6 +117,23 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+
+/*
+ * Bank size for the slot array.  Pages are assigned a bank according to their
+ * page number, with each bank being this size.  We want a power of 2 so that
+ * we can determine the bank number for a page with just bit shifting; we also
+ * want to keep the bank size small so that LRU victim search is fast.  16
+ * buffers per bank seems a good number.
+ */
+#define SLRU_BANK_BITSHIFT		4
+#define SLRU_BANK_SIZE			(1 << SLRU_BANK_BITSHIFT)
+
+/*
+ * Macro to get the bank number to which the slot belongs.
+ */
+#define SlotGetBankNumber(slotno)	((slotno) >> SLRU_BANK_BITSHIFT)
+
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -118,34 +146,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +173,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,8 +183,11 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
 Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
+	int			nbanks = nslots / SLRU_BANK_SIZE;
 	Size		sz;
 
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
 	sz += MAXALIGN(nslots * sizeof(char *));	/* page_buffer[] */
@@ -192,6 +196,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -208,16 +214,20 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -228,18 +238,14 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize locks and shared memory area */
 		char	   *ptr;
 		Size		offset;
-		int			slotno;
 
 		Assert(!found);
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
 		pg_atomic_init_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -260,6 +266,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -268,10 +278,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 
 		ptr += BUFFERALIGN(offset);
-		for (slotno = 0; slotno < nslots; slotno++)
+		for (int slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -280,11 +290,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the slot banks. */
+		for (int bankno = 0; bankno < nbanks; bankno++)
+		{
+			LWLockInitialize(&shared->bank_locks[bankno].lock, bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -293,16 +313,33 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
+
 /*
  * Initialize (or reinitialize) a page to zeroes.
  *
  * The page is not actually written, just set up in shared memory.
  * The slot number of the new page is returned.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
@@ -310,6 +347,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
 	Assert(shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
@@ -370,18 +409,21 @@ SimpleLruZeroLSNs(SlruCtl ctl, int slotno)
  * guarantee that new I/O hasn't been started before we return, though.
  * In fact the slot might not even contain the same page anymore.)
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = SlotGetBankNumber(slotno);
+
+	Assert(&shared->page_status[slotno] != SLRU_PAGE_EMPTY);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -424,7 +466,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -432,10 +474,13 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -478,9 +523,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = SlotGetBankNumber(slotno);
 
-		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		/* Release bank lock while doing I/O */
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -488,8 +534,8 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		/* Set the LSNs for this newly read-in page to zero */
 		SimpleLruZeroLSNs(ctl, slotno);
 
-		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		/* Re-acquire bank control lock and update page state */
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -523,7 +569,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must NOT be held at entry, but will be held at exit.
+ * Bank control lock must NOT be held at entry, but will be held at exit.
  * It is unspecified whether the lock will be shared or exclusive.
  */
 int
@@ -531,12 +577,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
-	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	/*
+	 * See if the page is already in a buffer pool.  The buffer pool is
+	 * divided into banks of buffers and each pageno may reside only in one
+	 * bank so limit the search within the bank.
+	 */
+	for (slotno = bankstart; slotno < bankend; slotno++)
 	{
 		if (shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
@@ -553,8 +606,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -568,15 +621,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
  * the write).  However, we *do* attempt a fresh write even if the page
  * is already being written; this is for checkpoints.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 {
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
+	int			bankno = SlotGetBankNumber(slotno);
 	bool		ok;
 
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
 		   shared->page_number[slotno] == pageno)
@@ -603,8 +660,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
-	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	/* Release bank lock while doing I/O */
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -618,8 +675,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 			CloseTransientFile(fdata->fd[i]);
 	}
 
-	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	/* Re-acquire bank lock and update page state */
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -648,6 +705,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 void
 SimpleLruWritePage(SlruCtl ctl, int slotno)
 {
+	Assert(&ctl->shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
 	SlruInternalWritePage(ctl, slotno, NULL);
 }
 
@@ -1035,17 +1094,53 @@ SlruReportIOError(SlruCtl ctl, int64 pageno, TransactionId xid)
 }
 
 /*
- * Select the slot to re-use when we need a free slot.
+ * Mark a buffer slot "most recently used".
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = SlotGetBankNumber(slotno);
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
+	/*
+	 * The reason for the if-test is that there are often many consecutive
+	 * accesses to the same page (particularly the latest page).  By
+	 * suppressing useless increments of bank_cur_lru_count, we reduce the
+	 * probability that old pages' counts will "wrap around" and make them
+	 * appear recently used.
+	 *
+	 * We allow this code to be executed concurrently by multiple processes
+	 * within SimpleLruReadPage_ReadOnly().  As long as int reads and writes
+	 * are atomic, this should not cause any completely-bogus values to enter
+	 * the computation.  However, it is possible for either bank_cur_lru_count
+	 * or individual page_lru_count entries to be "reset" to lower values than
+	 * they should have, in case a process is delayed while it executes this
+	 * function.  With care in SlruSelectLRUPage(), this does little harm, and
+	 * in any case the absolute worst possible consequence is a nonoptimal
+	 * choice of page to evict.  The gain from allowing concurrent reads of
+	 * SLRU pages seems worth it.
+	 */
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Select the slot to re-use when we need a free slot for the given page.
  *
- * The target page number is passed because we need to consider the
- * possibility that some other process reads in the target page while
- * we are doing I/O to free a slot.  Hence, check or recheck to see if
- * any slot already holds the target page, and return that slot if so.
- * Thus, the returned slot is *either* a slot already holding the pageno
- * (could be any state except EMPTY), *or* a freeable slot (state EMPTY
- * or CLEAN).
+ * The target page number is passed not only because we need to know the
+ * correct bank to use, but also because we need to consider the possibility
+ * that some other process reads in the target page while we are doing I/O to
+ * free a slot.  Hence, check or recheck to see if any slot already holds the
+ * target page, and return that slot if so.  Thus, the returned slot is
+ * *either* a slot already holding the pageno (could be any state except
+ * EMPTY), *or* a freeable slot (state EMPTY or CLEAN).
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 static int
 SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
@@ -1063,9 +1158,18 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		/* See if page already has a buffer assigned */
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
+
+		/*
+		 * See if the page is already in a buffer pool.  The buffer pool is
+		 * divided into banks of buffers and each pageno may reside only in
+		 * one bank so limit the search within the bank.
+		 */
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			if (shared->page_number[slotno] == pageno &&
 				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
@@ -1099,8 +1203,8 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
@@ -1203,6 +1307,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	int			slotno;
 	int64		pageno = 0;
 	int			i;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1213,10 +1318,27 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
+		/* Do nothing if slot is unused */
+		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
+			continue;
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1230,7 +1352,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1269,6 +1391,7 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1279,8 +1402,6 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
@@ -1292,15 +1413,29 @@ restart:
 	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
 						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbank = SlotGetBankNumber(0);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1330,10 +1465,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbank].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1372,17 +1509,31 @@ void
 SlruDeleteSegment(SlruCtl ctl, int64 segno)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank = SlotGetBankNumber(0);
 	int			slotno;
 	bool		did_write;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbank = SlotGetBankNumber(slotno);
 
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
@@ -1416,7 +1567,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 }
 
 /*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..bbc0aecc99 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,23 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, we use 2MB for every 1GB of shared buffers, up to 8MB,
+ * but always at least 16 buffers.  Otherwise just cap the configured amount to
+ * be between 16 and the maximum allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return Min(1024, Max(16,
+							 NBuffers / 512 - (NBuffers / 512) % 16));
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +214,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +264,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +275,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +305,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +314,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +388,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +400,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 997857679e..d405c61b21 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
@@ -776,7 +783,7 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
  * in mode.
  *
  * This function will not block waiting for a lock to become free - that's the
- * callers job.
+ * caller's job.
  *
  * Returns true if the lock isn't free and we need to wait.
  */
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 3d59d3646e..284d168f77 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+# 31 was SerialSLRULock
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d62060d58c..5e20818b83 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -213,6 +213,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -813,9 +814,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -841,6 +842,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -854,15 +864,17 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
 	/*
-	 * In this routine, we must hold both SerialControlLock and SerialSLRULock
-	 * simultaneously while making the SLRU data catch up with the new state
-	 * that we determine.
+	 * In this routine, we must hold both SerialControlLock and the SLRU
+	 * bank lock simultaneously while making the SLRU data catch up with
+	 * the new state that we determine.
 	 */
 	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
@@ -898,7 +910,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (isNewPage)
 	{
@@ -916,7 +928,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 	LWLockRelease(SerialControlLock);
 }
 
@@ -950,13 +962,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -1367,7 +1379,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4fffb46625..ec2f31f82a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -295,11 +295,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -310,19 +306,16 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -375,6 +368,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f024b1a849..909a0d8ee1 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -157,3 +157,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 37be0669bb..0a0300dc33 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2330,6 +2331,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..e75df5e82d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2109488654..09926309cd 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -55,8 +60,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -69,30 +72,41 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * for pg_xact, false for everything else).  group_lsn[] has
+	 * lsn_groups_per_page entries per buffer slot, each containing the
 	 * highest LSN known for a contiguous group of SLRU entries on that slot's
 	 * page.
 	 */
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
@@ -114,6 +128,19 @@ typedef struct SlruCtlData
 {
 	SlruShared	shared;
 
+	/*
+	 * Bitmask to determine bank number from page number.
+	 */
+	bits16		bank_mask;
+
+	/*
+	 * If true, use long segment filenames formed from lower 48 bits of the
+	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
+	 * filenames formed from lower 16 bits of the segment number e.g.
+	 * pg_xact/1234.
+	 */
+	bool		long_segment_names;
+
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
@@ -132,28 +159,35 @@ typedef struct SlruCtlData
 	 */
 	bool		(*PagePrecedes) (int64, int64);
 
-	/*
-	 * If true, use long segment filenames formed from lower 48 bits of the
-	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
-	 * filenames formed from lower 16 bits of the segment number e.g.
-	 * pg_xact/1234.
-	 */
-	bool		long_segment_names;
-
 	/*
 	 * Dir is set during SimpleLruInit and does not change thereafter. Since
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			bankno;
+
+	bankno = pageno & ctl->bank_mask;
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -182,5 +216,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 612fb5f42e..7affd29d3e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -179,6 +179,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 339c490300..876103a7a7 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2

#109Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#108)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Feb 23, 2024 at 1:48 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-07, Dilip Kumar wrote:

On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Sure, but is that really what we want?

So your question is do we want these buffers to be in multiple of
SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I
don't think it should create any problem logically. I mean we can
look again in the patch to see if we have made any such assumptions
but that should be fairly easy to fix, then maybe if we are going in
this way we should get rid of the check_slru_buffers() function as
well.

Not really, I just don't think the macro should be in slru.h.

Okay

Another thing I've been thinking is that perhaps it would be useful to
make the banks smaller, when the total number of buffers is small. For
example, if you have 16 or 32 buffers, it's not really clear to me that
it makes sense to have just 1 bank or 2 banks. It might be more
sensible to have 4 banks with 4 or 8 buffers instead. That should make
the algorithm scale down as well as up ...

It might be helpful to have small-size banks when SLRU buffers are set
to a very low value and we are only accessing a couple of pages at a
time (i.e. no buffer replacement) because in such cases most of the
contention will be on SLRU Bank lock. Although I am not sure how
practical such a use case would be, I mean if someone is using
multi-xact very heavily or creating frequent subtransaction overflow
then wouldn't they should set this buffer limit to some big enough
value? By doing this we would lose some simplicity of the patch I
mean instead of using the simple macro i.e. SLRU_BANK_SIZE we would
need to compute this and store it in SlruShared. Maybe that's not that
bad.

I haven't done either of those things in the attached v19 version. I
did go over the comments once again and rewrote the parts I was unhappy
with, including some existing ones. I think it's OK now from that point
of view ... at some point I thought about creating a separate README,
but in the end I thought it not necessary.

Thanks, I will review those changes.

I did add a bunch of Assert()s to make sure the locks that are supposed
to be held are actually held. This led me to testing the page status to
be not EMPTY during SimpleLruWriteAll() before calling
SlruInternalWritePage(), because the assert was firing. The previous
code is not really *buggy*, but to me it's weird to call WritePage() on
a slot with no contents.

Okay, I mean internally SlruInternalWritePage() will flush only if
the status is SLRU_PAGE_VALID, but it is better the way you have done.

Another change was in TransactionGroupUpdateXidStatus: the original code
had the leader doing pg_atomic_read_u32(&procglobal->clogGroupFirst) to
know which bank to lock. I changed it to simply be the page used by the
leader process; this doesn't need an atomic read, and should be the same
page anyway. (If it isn't, it's no big deal). But what's more: even if
we do read ->clogGroupFirst at that point, there's no guarantee that
this is going to be exactly for the same process that ends up being the
first in the list, because since we have not set it to INVALID by the
time we grab the bank lock, it is quite possible for more processes to
add themselves to the list.

Yeah, this looks better

I realized all this while rewriting the comments in a way that would let
me understand what was going on ... so IMO the effort was worthwhile.

+1

I will review and do some more testing early next week and share my feedback.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#110Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#109)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 23 Feb 2024, at 12:36, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Another thing I've been thinking is that perhaps it would be useful to
make the banks smaller, when the total number of buffers is small. For
example, if you have 16 or 32 buffers, it's not really clear to me that
it makes sense to have just 1 bank or 2 banks. It might be more
sensible to have 4 banks with 4 or 8 buffers instead. That should make
the algorithm scale down as well as up ...

It might be helpful to have small-size banks when SLRU buffers are set
to a very low value and we are only accessing a couple of pages at a
time (i.e. no buffer replacement) because in such cases most of the
contention will be on SLRU Bank lock. Although I am not sure how
practical such a use case would be, I mean if someone is using
multi-xact very heavily or creating frequent subtransaction overflow
then wouldn't they should set this buffer limit to some big enough
value? By doing this we would lose some simplicity of the patch I
mean instead of using the simple macro i.e. SLRU_BANK_SIZE we would
need to compute this and store it in SlruShared. Maybe that's not that
bad.

I'm sure anyone with multiple CPUs should increase, not decrease previous default of 128 buffers (with 512MB shared buffers). Having more CPUs (the only way to benefit from more locks) implies bigger transaction buffers.
IMO making bank size variable adds unneeded computation overhead, bank search loops should be unrollable by compiler etc.
Originally there was a patch set step, that packed bank's page addresses together in one array. It was done to make bank search a SIMD instruction.

Best regards, Andrey Borodin.

#111Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#109)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Fri, Feb 23, 2024 at 1:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Feb 23, 2024 at 1:48 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-07, Dilip Kumar wrote:

On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Sure, but is that really what we want?

So your question is do we want these buffers to be in multiple of
SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I
don't think it should create any problem logically. I mean we can
look again in the patch to see if we have made any such assumptions
but that should be fairly easy to fix, then maybe if we are going in
this way we should get rid of the check_slru_buffers() function as
well.

Not really, I just don't think the macro should be in slru.h.

Okay

Another thing I've been thinking is that perhaps it would be useful to
make the banks smaller, when the total number of buffers is small. For
example, if you have 16 or 32 buffers, it's not really clear to me that
it makes sense to have just 1 bank or 2 banks. It might be more
sensible to have 4 banks with 4 or 8 buffers instead. That should make
the algorithm scale down as well as up ...

It might be helpful to have small-size banks when SLRU buffers are set
to a very low value and we are only accessing a couple of pages at a
time (i.e. no buffer replacement) because in such cases most of the
contention will be on SLRU Bank lock. Although I am not sure how
practical such a use case would be, I mean if someone is using
multi-xact very heavily or creating frequent subtransaction overflow
then wouldn't they should set this buffer limit to some big enough
value? By doing this we would lose some simplicity of the patch I
mean instead of using the simple macro i.e. SLRU_BANK_SIZE we would
need to compute this and store it in SlruShared. Maybe that's not that
bad.

I haven't done either of those things in the attached v19 version. I
did go over the comments once again and rewrote the parts I was unhappy
with, including some existing ones. I think it's OK now from that point
of view ... at some point I thought about creating a separate README,
but in the end I thought it not necessary.

Thanks, I will review those changes.

Few other things I noticed while reading through the patch, I haven't
read it completely yet but this is what I got for now.

1.
+ * If no process is already in the list, we're the leader; our first step
+ * is to "close out the group" by resetting the list pointer from
+ * ProcGlobal->clogGroupFirst (this lets other processes set up other
+ * groups later); then we lock the SLRU bank corresponding to our group's
+ * page, do the SLRU updates, release the SLRU bank lock, and wake up the
+ * sleeping processes.

I think here we are saying that we "close out the group" before
acquiring the SLRU lock but that's not true. We keep the group open
until we gets the lock so that we can get maximum members in while we
are anyway waiting for the lock.

2.
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
  RepOriginId nodeid, int slotno)
 {
- Assert(TransactionIdIsNormal(xid));
+ if (!TransactionIdIsNormal(xid))
+ return;
+
+ entryno = TransactionIdToCTsEntry(xid);

I do not understand why we need this change.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#112Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrey M. Borodin (#110)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-23, Andrey M. Borodin wrote:

I'm sure anyone with multiple CPUs should increase, not decrease
previous default of 128 buffers (with 512MB shared buffers). Having
more CPUs (the only way to benefit from more locks) implies bigger
transaction buffers.

Sure.

IMO making bank size variable adds unneeded computation overhead, bank
search loops should be unrollable by compiler etc.

Makes sense.

Originally there was a patch set step, that packed bank's page
addresses together in one array. It was done to make bank search a
SIMD instruction.

Ants Aasma had proposed a rework of the LRU code for better performance.
He told me it depended on bank size being 16, so you're right that it's
probably not a good idea to make it variable.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#113Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#111)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-23, Dilip Kumar wrote:

1.
+ * If no process is already in the list, we're the leader; our first step
+ * is to "close out the group" by resetting the list pointer from
+ * ProcGlobal->clogGroupFirst (this lets other processes set up other
+ * groups later); then we lock the SLRU bank corresponding to our group's
+ * page, do the SLRU updates, release the SLRU bank lock, and wake up the
+ * sleeping processes.

I think here we are saying that we "close out the group" before
acquiring the SLRU lock but that's not true. We keep the group open
until we gets the lock so that we can get maximum members in while we
are anyway waiting for the lock.

Absolutely right. Reworded that.

2.
static void
TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
RepOriginId nodeid, int slotno)
{
- Assert(TransactionIdIsNormal(xid));
+ if (!TransactionIdIsNormal(xid))
+ return;
+
+ entryno = TransactionIdToCTsEntry(xid);

I do not understand why we need this change.

Ah yeah, I was bothered by the fact that if you pass Xid values earlier
than NormalXid to this function, we'd reply with some nonsensical values
instead of throwing an error. But you're right that it doesn't belong
in this patch, so I removed that.

Here's a version with these fixes, where I also added some text to the
pg_stat_slru documentation:

+  <para>
+   For each <literal>SLRU</literal> area that's part of the core server,
+   there is a configuration parameter that controls its size, with the suffix
+   <literal>_buffers</literal> appended.  For historical
+   reasons, the names are not exact matches, but <literal>Xact</literal>
+   corresponds to <literal>transaction_buffers</literal> and the rest should
+   be obvious.
+   <!-- Should we edit pgstat_internal.h::slru_names so that the "name" matches
+        the GUC name?? -->
+  </para>

I think I would like to suggest renaming the GUCs to have the _slru_ bit
in the middle:

+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_slru_buffers = 0          # memory for pg_commit_ts (0 = auto)
+#multixact_offsets_slru_buffers = 16            # memory for pg_multixact/offsets
+#multixact_members_slru_buffers = 32            # memory for pg_multixact/members
+#notify_slru_buffers = 16                   # memory for pg_notify
+#serializable_slru_buffers = 32             # memory for pg_serial
+#subtransaction_slru_buffers = 0            # memory for pg_subtrans (0 = auto)
+#transaction_slru_buffers = 0               # memory for pg_xact (0 = auto)

and the pgstat_internal.h table:

static const char *const slru_names[] = {
"commit_timestamp",
"multixact_members",
"multixact_offsets",
"notify",
"serializable",
"subtransaction",
"transaction",
"other" /* has to be last */
};

This way they match perfectly.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
(George Orwell's The Lord of the Rings)

Attachments:

v20-0001-Make-SLRU-buffer-sizes-configurable.patchtext/x-diff; charset=utf-8Download
From 4dc139e70feb5e43bbe2689cfb044ef0957761b3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 22 Feb 2024 18:42:56 +0100
Subject: [PATCH v20] Make SLRU buffer sizes configurable

Also, divide the slot array in banks, so that the LRU algorithm can be
made more scalable.

Also remove the centralized control lock for even better scalability.

Authors: Dilip Kumar, Andrey Borodin
---
 doc/src/sgml/config.sgml                      | 139 +++++++
 doc/src/sgml/monitoring.sgml                  |  14 +-
 src/backend/access/transam/clog.c             | 236 ++++++++----
 src/backend/access/transam/commit_ts.c        |  81 ++--
 src/backend/access/transam/multixact.c        | 190 +++++++---
 src/backend/access/transam/slru.c             | 357 +++++++++++++-----
 src/backend/access/transam/subtrans.c         | 103 ++++-
 src/backend/commands/async.c                  |  61 ++-
 src/backend/storage/lmgr/lwlock.c             |   9 +-
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  34 +-
 .../utils/activity/wait_event_names.txt       |  15 +-
 src/backend/utils/init/globals.c              |   9 +
 src/backend/utils/misc/guc_tables.c           |  78 ++++
 src/backend/utils/misc/postgresql.conf.sample |   9 +
 src/include/access/clog.h                     |   1 -
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |  86 +++--
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   8 +
 src/include/storage/lwlock.h                  |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |  11 +
 src/test/modules/test_slru/test_slru.c        |  35 +-
 26 files changed, 1161 insertions(+), 353 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 36a2a5ce43..567aa128b6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-members-buffers" xreflabel="multixact_members_buffers">
+      <term><varname>multixact_members_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_members_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offsets-buffers" xreflabel="multixact_offsets_buffers">
+      <term><varname>multixact_offsets_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offsets_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf9363ac8..581bf2d6b0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4482,12 +4482,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
   <para>
    <productname>PostgreSQL</productname> accesses certain on-disk information
-   via <firstterm>SLRU</firstterm> (simple least-recently-used) caches.
+   via <literal>SLRU</literal> (<firstterm>simple least-recently-used</firstterm>)
+   caches.
    The <structname>pg_stat_slru</structname> view will contain
    one row for each tracked SLRU cache, showing statistics about access
    to cached pages.
   </para>
 
+  <para>
+   For each <literal>SLRU</literal> area that's part of the core server,
+   there is a configuration parameter that controls its size, with the suffix
+   <literal>_buffers</literal> appended.  For historical
+   reasons, the names are not exact matches, but <literal>Xact</literal>
+   corresponds to <literal>transaction_buffers</literal> and the rest should
+   be obvious.
+   <!-- Should we edit pgstat_internal.h::slru_names so that the "name" matches
+        the GUC name?? -->
+  </para>
+
   <table id="pg-stat-slru-view" xreflabel="pg_stat_slru">
    <title><structname>pg_stat_slru</structname> View</title>
    <tgroup cols="1">
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 97f7434da3..a381d88db3 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -3,12 +3,13 @@
  * clog.c
  *		PostgreSQL transaction-commit-log manager
  *
- * This module replaces the old "pg_log" access code, which treated pg_log
- * essentially like a relation, in that it went through the regular buffer
- * manager.  The problem with that was that there wasn't any good way to
- * recycle storage space for transactions so old that they'll never be
- * looked up again.  Now we use specialized access code so that the commit
- * log can be broken into relatively small, independent segments.
+ * This module stores two bits per transaction regarding its commit/abort
+ * status; the status for four transactions fit in a byte.
+ *
+ * This would be a pretty simple abstraction on top of slru.c, except that
+ * for performance reasons we allow multiple transactions that are
+ * committing concurrently to form a queue, so that a single process can
+ * update the status for all of them within a single lock acquisition run.
  *
  * XLOG interactions: this module generates an XLOG record whenever a new
  * CLOG page is initialized to zeroes.  Other writes of CLOG come from
@@ -43,6 +44,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +64,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,15 +295,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
-	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * When there is contention on the SLRU bank lock we need, we try to group
+	 * multiple updates; a single leader process will perform transaction
+	 * status updates for multiple backends so that the number of times the
+	 * bank lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -310,17 +326,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +349,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +371,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +423,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * Subroutine for TransactionIdSetPageStatus, q.v.
+ *
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -425,16 +443,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 {
 	volatile PROC_HDR *procglobal = ProcGlobal;
 	PGPROC	   *proc = MyProc;
-	int			pgprocno = MyProcNumber;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -442,6 +461,29 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * We put ourselves in the queue by writing MyProcNumber to
+	 * ProcGlobal->clogGroupFirst.  However, if there's already a process
+	 * listed there, we compare our pageno with that of that process; if it
+	 * differs, we cannot participate in the group, so we return for caller to
+	 * update pg_xact in the normal way.
+	 *
+	 * If we're not the first process in the list, we must follow the leader.
+	 * We do this by storing the data we want updated in our PGPROC entry
+	 * where the leader can find it, then going to sleep.
+	 *
+	 * If no process is already in the list, we're the leader; our first step
+	 * is to lock the SLRU bank to which our page belongs, then we close out
+	 * the group by resetting the list pointer from ProcGlobal->clogGroupFirst
+	 * (this lets other processes set up other groups later); finally we do
+	 * the SLRU updates, release the SLRU bank lock, and wake up the sleeping
+	 * processes.
+	 *
+	 * If another group starts to update a page in a different SLRU bank, they
+	 * can proceed concurrently, since the bank lock they're going to use is
+	 * different from ours.  If another group starts to update a page in the
+	 * same bank as ours, they wait until we release the lock.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -453,10 +495,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * There is a race condition here, which is that after doing the below
 		 * check and before adding this proc's clog update to a group, the
 		 * group leader might have already finished the group update for this
-		 * page and becomes group leader of another group. This will lead to a
-		 * situation where a single group can have different clog page
-		 * updates.  This isn't likely and will still work, just maybe a bit
-		 * less efficiently.
+		 * page and becomes group leader of another group, updating a
+		 * different page.  This will lead to a situation where a single group
+		 * can have different clog page updates.  This isn't likely and will
+		 * still work, just less efficiently -- we handle this case by
+		 * switching to a different bank lock in the loop below.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
 			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
@@ -474,7 +517,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
 										   &nextidx,
-										   (uint32) pgprocno))
+										   (uint32) MyProcNumber))
 			break;
 	}
 
@@ -508,13 +551,21 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * By here, we know we're the leader process.  Acquire the SLRU bank lock
+	 * that corresponds to the page we originally wanted to modify.
+	 */
+	prevpageno = ProcGlobal->allProcs[MyProcNumber].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
 	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * (Trying to pop elements one at a time could lead to an ABA problem.)
+	 *
+	 * At this point, any processes trying to do this would create a separate
+	 * group.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -526,6 +577,31 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the page to update belongs to a different bank than the previous
+		 * one, exchange bank lock to the new one.  This should be quite rare,
+		 * as described above.
+		 *
+		 * (We could try to optimize this by waking up the processes for which
+		 * we have already updated the status while we exchange the lock, but
+		 * the code doesn't do that at present.  I think it'd require
+		 * additional bookkeeping, making the common path slower in order to
+		 * improve an infrequent case.)
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,12 +621,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
 	 * don't do this under the lock so as to keep lock hold times to a
 	 * minimum.
+	 *
+	 * (Perhaps we could do this in two passes, the first setting
+	 * clogGroupNext to invalid while saving the semaphores to an array, then
+	 * a single write barrier, then another pass unlocking the semaphores.)
 	 */
 	while (wakeidx != INVALID_PGPROCNO)
 	{
@@ -574,7 +655,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -585,6 +666,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -666,7 +752,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -674,23 +760,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -705,13 +786,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -722,8 +826,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -732,7 +837,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -781,8 +886,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -814,7 +920,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -846,6 +952,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -856,13 +963,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1000,16 +1108,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 6bfe60343e..fb13d081ea 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -345,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +501,18 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +534,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +573,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -715,13 +744,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -760,8 +790,6 @@ DeactivateCommitTs(void)
 	TransamVariables->oldestCommitTsXid = InvalidTransactionId;
 	TransamVariables->newestCommitTsXid = InvalidTransactionId;
 
-	LWLockRelease(CommitTsLock);
-
 	/*
 	 * Remove *all* files.  This is necessary so that there are no leftover
 	 * files; in the case where this feature is later enabled after running
@@ -769,10 +797,16 @@ DeactivateCommitTs(void)
 	 * (We can probably tolerate out-of-sequence files, as they are going to
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
+	 *
+	 * Note that we do this with CommitTsLock acquired in exclusive mode. This
+	 * is very heavy-handed, but since this routine can only be called in the
+	 * replica and should happen very rarely, we don't worry too much about
+	 * it.  Note also that no process should be consulting this SLRU if we
+	 * have just deactivated it.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+
+	LWLockRelease(CommitTsLock);
 }
 
 /*
@@ -804,6 +838,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -824,12 +859,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -983,16 +1020,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index febc429f72..311fdb2b21 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offsets_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_members_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "MultiXactOffset", multixact_offsets_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "MultiXactMember", multixact_members_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offsets_buffers
+ */
+bool
+check_multixact_offsets_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offsets_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_members_buffers
+ */
+bool
+check_multixact_members_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_members_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2049,6 +2130,8 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
+	/* Clean up offsets state */
+
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
@@ -2056,9 +2139,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
 						pageno);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
 	 * TrimCLOG() for background.  Unlike CLOG, some WAL record covers every
@@ -2072,7 +2152,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2080,10 +2162,9 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/*
 	 * And the same for members.
 	 *
@@ -2093,8 +2174,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
 						pageno);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
 	 * TrimCLOG() for motivation.
@@ -2105,7 +2184,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2120,10 +2201,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2411,6 +2491,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2421,13 +2502,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2460,15 +2542,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2766,7 +2850,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3248,31 +3332,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 93cefcd10d..f774d285b7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1,28 +1,38 @@
 /*-------------------------------------------------------------------------
  *
  * slru.c
- *		Simple LRU buffering for transaction status logfiles
+ *		Simple LRU buffering for wrap-around-able permanent metadata
  *
- * We use a simple least-recently-used scheme to manage a pool of page
- * buffers.  Under ordinary circumstances we expect that write
- * traffic will occur mostly to the latest page (and to the just-prior
- * page, soon after a page transition).  Read traffic will probably touch
- * a larger span of pages, but in any case a fairly small number of page
- * buffers should be sufficient.  So, we just search the buffers using plain
- * linear search; there's no need for a hashtable or anything fancy.
- * The management algorithm is straight LRU except that we will never swap
- * out the latest page (since we know it's going to be hit again eventually).
+ * This module is used to maintain various pieces of transaction status
+ * indexed by TransactionId (such as commit status, parent transaction ID,
+ * commit timestamp), as well as storage for multixacts, serializable
+ * isolation locks and NOTIFY traffic.  Extensions can define their own
+ * SLRUs, too.
  *
- * We use a control LWLock to protect the shared data structures, plus
- * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
- * must be held to examine or modify any shared state.  A process that is
- * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.  One exception
- * is latest_page_number, which is read and written using atomic ops.
+ * Under ordinary circumstances we expect that write traffic will occur
+ * mostly to the latest page (and to the just-prior page, soon after a
+ * page transition).  Read traffic will probably touch a larger span of
+ * pages, but a relatively small number of buffers should be sufficient.
  *
- * "Holding the control lock" means exclusive lock in all cases except for
- * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
- * the implications of that.
+ * We use a simple least-recently-used scheme to manage a pool of shared
+ * page buffers, split in banks by the lowest bits of the page number, and
+ * the management algorithm only processes the bank to which the desired
+ * page belongs, so a linear search is sufficient; there's no need for a
+ * hashtable or anything fancy.  The algorithm is straight LRU except that
+ * we will never swap out the latest page (since we know it's going to be
+ * hit again eventually).
+ *
+ * We use per-bank control LWLocks to protect the shared data structures,
+ * plus per-buffer LWLocks that synchronize I/O for each buffer.  The
+ * bank's control lock must be held to examine or modify any of the bank's
+ * shared state.  A process that is reading in or writing out a page
+ * buffer does not hold the control lock, only the per-buffer lock for the
+ * buffer it is working on.  One exception is latest_page_number, which is
+ * read and written using atomic ops.
+ *
+ * "Holding the bank control lock" means exclusive lock in all cases
+ * except for SimpleLruReadPage_ReadOnly(); see comments for
+ * SlruRecentlyUsed() for the implications of that.
  *
  * When initiating I/O on a buffer, we acquire the per-buffer lock exclusively
  * before releasing the control lock.  The per-buffer lock is released after
@@ -60,6 +70,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -106,6 +117,23 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+
+/*
+ * Bank size for the slot array.  Pages are assigned a bank according to their
+ * page number, with each bank being this size.  We want a power of 2 so that
+ * we can determine the bank number for a page with just bit shifting; we also
+ * want to keep the bank size small so that LRU victim search is fast.  16
+ * buffers per bank seems a good number.
+ */
+#define SLRU_BANK_BITSHIFT		4
+#define SLRU_BANK_SIZE			(1 << SLRU_BANK_BITSHIFT)
+
+/*
+ * Macro to get the bank number to which the slot belongs.
+ */
+#define SlotGetBankNumber(slotno)	((slotno) >> SLRU_BANK_BITSHIFT)
+
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -118,34 +146,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +173,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,8 +183,12 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
 Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
+	int			nbanks = nslots / SLRU_BANK_SIZE;
 	Size		sz;
 
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+	Assert(nslots % SLRU_BANK_SIZE == 0);
+
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
 	sz += MAXALIGN(nslots * sizeof(char *));	/* page_buffer[] */
@@ -192,6 +197,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -199,6 +206,21 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
+/*
+ * Determine a number of SLRU buffers to use.
+ *
+ * We simply divide shared_buffers by the divisor given and cap
+ * that at the maximum given; but always at least SLRU_BANK_SIZE.
+ * Round down to the nearest multiple of SLRU_BANK_SIZE.
+ */
+int
+SimpleLruAutotuneBuffers(int divisor, int max)
+{
+	return Min(max - (max % SLRU_BANK_SIZE),
+			   Max(SLRU_BANK_SIZE,
+				   NBuffers / divisor - (NBuffers / divisor) % SLRU_BANK_SIZE));
+}
+
 /*
  * Initialize, or attach to, a simple LRU cache in shared memory.
  *
@@ -208,16 +230,20 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -233,12 +259,9 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
 		pg_atomic_init_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -259,6 +282,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -270,7 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (int slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -279,11 +306,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the slot banks. */
+		for (int bankno = 0; bankno < nbanks; bankno++)
+		{
+			LWLockInitialize(&shared->bank_locks[bankno].lock, bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -292,16 +329,33 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
+
 /*
  * Initialize (or reinitialize) a page to zeroes.
  *
  * The page is not actually written, just set up in shared memory.
  * The slot number of the new page is returned.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
@@ -309,6 +363,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
 	Assert(shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
@@ -369,18 +425,21 @@ SimpleLruZeroLSNs(SlruCtl ctl, int slotno)
  * guarantee that new I/O hasn't been started before we return, though.
  * In fact the slot might not even contain the same page anymore.)
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = SlotGetBankNumber(slotno);
+
+	Assert(&shared->page_status[slotno] != SLRU_PAGE_EMPTY);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -423,7 +482,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -431,18 +490,21 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
 		slotno = SlruSelectLRUPage(ctl, pageno);
 
 		/* Did we find the page in memory? */
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+			shared->page_number[slotno] == pageno)
 		{
 			/*
 			 * If page is still being read in, we must wait for I/O.  Likewise
@@ -477,9 +539,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = SlotGetBankNumber(slotno);
 
-		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		/* Release bank lock while doing I/O */
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,8 +550,8 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		/* Set the LSNs for this newly read-in page to zero */
 		SimpleLruZeroLSNs(ctl, slotno);
 
-		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		/* Re-acquire bank control lock and update page state */
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -522,22 +585,25 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must NOT be held at entry, but will be held at exit.
+ * Bank control lock must NOT be held at entry, but will be held at exit.
  * It is unspecified whether the lock will be shared or exclusive.
  */
 int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (int slotno = 0; slotno < shared->num_slots; slotno++)
+	for (int slotno = bankstart; slotno < bankend; slotno++)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+			shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
 			/* See comments for SlruRecentlyUsed macro */
@@ -551,8 +617,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -566,15 +632,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
  * the write).  However, we *do* attempt a fresh write even if the page
  * is already being written; this is for checkpoints.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 {
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
+	int			bankno = SlotGetBankNumber(slotno);
 	bool		ok;
 
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
 		   shared->page_number[slotno] == pageno)
@@ -601,8 +671,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
-	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	/* Release bank lock while doing I/O */
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -614,8 +684,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 			CloseTransientFile(fdata->fd[i]);
 	}
 
-	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	/* Re-acquire bank lock and update page state */
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -644,6 +714,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 void
 SimpleLruWritePage(SlruCtl ctl, int slotno)
 {
+	Assert(&ctl->shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
 	SlruInternalWritePage(ctl, slotno, NULL);
 }
 
@@ -1028,17 +1100,53 @@ SlruReportIOError(SlruCtl ctl, int64 pageno, TransactionId xid)
 }
 
 /*
- * Select the slot to re-use when we need a free slot.
+ * Mark a buffer slot "most recently used".
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = SlotGetBankNumber(slotno);
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
+	/*
+	 * The reason for the if-test is that there are often many consecutive
+	 * accesses to the same page (particularly the latest page).  By
+	 * suppressing useless increments of bank_cur_lru_count, we reduce the
+	 * probability that old pages' counts will "wrap around" and make them
+	 * appear recently used.
+	 *
+	 * We allow this code to be executed concurrently by multiple processes
+	 * within SimpleLruReadPage_ReadOnly().  As long as int reads and writes
+	 * are atomic, this should not cause any completely-bogus values to enter
+	 * the computation.  However, it is possible for either bank_cur_lru_count
+	 * or individual page_lru_count entries to be "reset" to lower values than
+	 * they should have, in case a process is delayed while it executes this
+	 * function.  With care in SlruSelectLRUPage(), this does little harm, and
+	 * in any case the absolute worst possible consequence is a nonoptimal
+	 * choice of page to evict.  The gain from allowing concurrent reads of
+	 * SLRU pages seems worth it.
+	 */
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Select the slot to re-use when we need a free slot for the given page.
  *
- * The target page number is passed because we need to consider the
- * possibility that some other process reads in the target page while
- * we are doing I/O to free a slot.  Hence, check or recheck to see if
- * any slot already holds the target page, and return that slot if so.
- * Thus, the returned slot is *either* a slot already holding the pageno
- * (could be any state except EMPTY), *or* a freeable slot (state EMPTY
- * or CLEAN).
+ * The target page number is passed not only because we need to know the
+ * correct bank to use, but also because we need to consider the possibility
+ * that some other process reads in the target page while we are doing I/O to
+ * free a slot.  Hence, check or recheck to see if any slot already holds the
+ * target page, and return that slot if so.  Thus, the returned slot is
+ * *either* a slot already holding the pageno (could be any state except
+ * EMPTY), *or* a freeable slot (state EMPTY or CLEAN).
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 static int
 SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
@@ -1055,12 +1163,17 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
+
+		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
 
 		/* See if page already has a buffer assigned */
 		for (int slotno = 0; slotno < shared->num_slots; slotno++)
 		{
-			if (shared->page_number[slotno] == pageno &&
-				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+			if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+				shared->page_number[slotno] == pageno)
 				return slotno;
 		}
 
@@ -1091,14 +1204,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (int slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (int slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
 
 			if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 				return slotno;
+
 			this_delta = cur_count - shared->page_lru_count[slotno];
 			if (this_delta < 0)
 			{
@@ -1193,6 +1307,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	SlruShared	shared = ctl->shared;
 	SlruWriteAllData fdata;
 	int64		pageno = 0;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1203,10 +1318,27 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
+		/* Do nothing if slot is unused */
+		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
+			continue;
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1220,7 +1352,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1259,6 +1391,7 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1269,8 +1402,6 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
@@ -1282,15 +1413,29 @@ restart:
 	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
 						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbank = SlotGetBankNumber(0);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1320,10 +1465,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbank].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1362,19 +1509,33 @@ void
 SlruDeleteSegment(SlruCtl ctl, int64 segno)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		did_write;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
 
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		/* not the segment we're looking for */
 		if (pagesegno != segno)
 			continue;
@@ -1405,7 +1566,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 }
 
 /*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..2b3bbd54eb 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,22 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +213,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "Subtrans", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +263,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +274,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +304,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +313,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +387,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +399,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..0c2ac60946 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "Notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 997857679e..d405c61b21 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
@@ -776,7 +783,7 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
  * in mode.
  *
  * This function will not block waiting for a lock to become free - that's the
- * callers job.
+ * caller's job.
  *
  * Returns true if the lock isn't free and we need to wait.
  */
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 3d59d3646e..284d168f77 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+# 31 was SerialSLRULock
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d62060d58c..633ea8ecec 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -213,6 +213,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -813,9 +814,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "Serial",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -841,6 +842,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -854,15 +864,17 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
 	/*
-	 * In this routine, we must hold both SerialControlLock and SerialSLRULock
-	 * simultaneously while making the SLRU data catch up with the new state
-	 * that we determine.
+	 * In this routine, we must hold both SerialControlLock and the SLRU bank
+	 * lock simultaneously while making the SLRU data catch up with the new
+	 * state that we determine.
 	 */
 	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
@@ -898,7 +910,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (isNewPage)
 	{
@@ -916,7 +928,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 	LWLockRelease(SerialControlLock);
 }
 
@@ -950,13 +962,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -1367,7 +1379,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4fffb46625..ec2f31f82a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -295,11 +295,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -310,19 +306,16 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -375,6 +368,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f024b1a849..909a0d8ee1 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -157,3 +157,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_members_buffers = 32;
+int			multixact_offsets_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 527a2b2734..ce0ffe80af 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2330,6 +2331,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_members_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_members_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_members_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offsets_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offsets_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offsets_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..e75df5e82d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offsets_buffers = 16			# memory for pg_multixact/offsets
+#multixact_members_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2109488654..8a8d191873 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -55,8 +60,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -69,30 +72,41 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * for pg_xact, false for everything else).  group_lsn[] has
+	 * lsn_groups_per_page entries per buffer slot, each containing the
 	 * highest LSN known for a contiguous group of SLRU entries on that slot's
 	 * page.
 	 */
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
@@ -114,6 +128,19 @@ typedef struct SlruCtlData
 {
 	SlruShared	shared;
 
+	/*
+	 * Bitmask to determine bank number from page number.
+	 */
+	bits16		bank_mask;
+
+	/*
+	 * If true, use long segment filenames formed from lower 48 bits of the
+	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
+	 * filenames formed from lower 16 bits of the segment number e.g.
+	 * pg_xact/1234.
+	 */
+	bool		long_segment_names;
+
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
@@ -132,28 +159,36 @@ typedef struct SlruCtlData
 	 */
 	bool		(*PagePrecedes) (int64, int64);
 
-	/*
-	 * If true, use long segment filenames formed from lower 48 bits of the
-	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
-	 * filenames formed from lower 16 bits of the segment number e.g.
-	 * pg_xact/1234.
-	 */
-	bool		long_segment_names;
-
 	/*
 	 * Dir is set during SimpleLruInit and does not change thereafter. Since
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			bankno;
+
+	bankno = pageno & ctl->bank_mask;
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
+extern int	SimpleLruAutotuneBuffers(int divisor, int max);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -182,5 +217,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 612fb5f42e..7affd29d3e 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -179,6 +179,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_members_buffers;
+extern PGDLLIMPORT int multixact_offsets_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 339c490300..876103a7a7 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_members_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_multixact_offsets_buffers(int *newval, void **extra,
+											GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2

#114Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#113)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Mon, Feb 26, 2024 at 9:46 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2024-Feb-23, Dilip Kumar wrote:

+  <para>
+   For each <literal>SLRU</literal> area that's part of the core server,
+   there is a configuration parameter that controls its size, with the
suffix
+   <literal>_buffers</literal> appended.  For historical
+   reasons, the names are not exact matches, but <literal>Xact</literal>
+   corresponds to <literal>transaction_buffers</literal> and the rest
should
+   be obvious.
+   <!-- Should we edit pgstat_internal.h::slru_names so that the "name"
matches
+        the GUC name?? -->
+  </para>

I think I would like to suggest renaming the GUCs to have the _slru_ bit
in the middle:

+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_slru_buffers = 0          # memory for pg_commit_ts (0
= auto)
+#multixact_offsets_slru_buffers = 16            # memory for
pg_multixact/offsets
+#multixact_members_slru_buffers = 32            # memory for
pg_multixact/members
+#notify_slru_buffers = 16                   # memory for pg_notify
+#serializable_slru_buffers = 32             # memory for pg_serial
+#subtransaction_slru_buffers = 0            # memory for pg_subtrans (0 =
auto)
+#transaction_slru_buffers = 0               # memory for pg_xact (0 =
auto)

and the pgstat_internal.h table:

static const char *const slru_names[] = {
"commit_timestamp",
"multixact_members",
"multixact_offsets",
"notify",
"serializable",
"subtransaction",
"transaction",
"other" /* has to be last
*/
};

This way they match perfectly.

Yeah, I think this looks fine to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#115Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#114)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-27, Dilip Kumar wrote:

static const char *const slru_names[] = {
"commit_timestamp",
"multixact_members",
"multixact_offsets",
"notify",
"serializable",
"subtransaction",
"transaction",
"other" /* has to be last
*/
};

Here's a patch for the renaming part.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No nos atrevemos a muchas cosas porque son difíciles,
pero son difíciles porque no nos atrevemos a hacerlas" (Séneca)

Attachments:

0001-Rename-SLRU-elements-in-pg_stat_slru.patchtext/x-diff; charset=utf-8Download
From 91741984cbd77f88e39b6fac8e8c7dc622d2ccf4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 27 Feb 2024 16:56:00 +0100
Subject: [PATCH] Rename SLRU elements in pg_stat_slru

The new names are intended to match an upcoming patch that adds a few
GUCs to configure the SLRU buffer sizes.

Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml            | 14 ++++----
 src/backend/access/transam/clog.c       |  2 +-
 src/backend/access/transam/commit_ts.c  |  2 +-
 src/backend/access/transam/multixact.c  |  4 +--
 src/backend/access/transam/subtrans.c   |  2 +-
 src/backend/commands/async.c            |  2 +-
 src/backend/storage/lmgr/predicate.c    |  2 +-
 src/include/utils/pgstat_internal.h     | 14 ++++----
 src/test/isolation/expected/stats.out   | 44 ++++++++++++-------------
 src/test/isolation/expected/stats_1.out | 44 ++++++++++++-------------
 src/test/isolation/specs/stats.spec     |  4 +--
 src/test/regress/expected/stats.out     | 14 ++++----
 src/test/regress/sql/stats.sql          | 14 ++++----
 13 files changed, 81 insertions(+), 81 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf9363ac8..7d92e68572 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4853,13 +4853,13 @@ description | Waiting for a newly initialized WAL file to reach durable storage
         <literal>NULL</literal> or is not specified, all the counters shown in
         the <structname>pg_stat_slru</structname> view for all SLRU caches are
         reset. The argument can be one of
-        <literal>CommitTs</literal>,
-        <literal>MultiXactMember</literal>,
-        <literal>MultiXactOffset</literal>,
-        <literal>Notify</literal>,
-        <literal>Serial</literal>,
-        <literal>Subtrans</literal>, or
-        <literal>Xact</literal>
+        <literal>commit_timestamp</literal>,
+        <literal>multixact_members</literal>,
+        <literal>multixact_offsets</literal>,
+        <literal>notify</literal>,
+        <literal>serializable</literal>,
+        <literal>subtransaction</literal>, or
+        <literal>transaction</literal>
         to reset the counters for only that entry.
         If the argument is <literal>other</literal> (or indeed, any
         unrecognized name), then the counters for all other SLRU caches, such
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 97f7434da3..34f079cbb1 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -706,7 +706,7 @@ void
 CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(XactCtl, "transaction", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
 				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
 				  SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 6bfe60343e..d965db89c7 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -529,7 +529,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0,
 				  CommitTsSLRULock, "pg_commit_ts",
 				  LWTRANCHE_COMMITTS_BUFFER,
 				  SYNC_HANDLER_COMMIT_TS,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index febc429f72..f8bb83927c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "multixact_offsets", NUM_MULTIXACTOFFSET_BUFFERS, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "multixact_members", NUM_MULTIXACTMEMBER_BUFFERS, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..6aa47af43e 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -200,7 +200,7 @@ void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "subtransaction", NUM_SUBTRANS_BUFFERS, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..490c84dc19 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "notify", NUM_NOTIFY_BUFFERS, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d62060d58c..09e11680fc 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -812,7 +812,7 @@ SerialInit(void)
 	 * Set up SLRU management of the pg_serial data.
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
-	SimpleLruInit(SerialSlruCtl, "Serial",
+	SimpleLruInit(SerialSlruCtl, "serializable",
 				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 0cb8a58cba..1710bf9792 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -269,13 +269,13 @@ typedef struct PgStat_KindInfo
  * definitions.
  */
 static const char *const slru_names[] = {
-	"CommitTs",
-	"MultiXactMember",
-	"MultiXactOffset",
-	"Notify",
-	"Serial",
-	"Subtrans",
-	"Xact",
+	"commit_timestamp",
+	"multixact_members",
+	"multixact_offsets",
+	"notify",
+	"serializable",
+	"subtransaction",
+	"transaction",
 	"other"						/* has to be last */
 };
 
diff --git a/src/test/isolation/expected/stats.out b/src/test/isolation/expected/stats.out
index 61b5a710ec..8c7fe60217 100644
--- a/src/test/isolation/expected/stats.out
+++ b/src/test/isolation/expected/stats.out
@@ -3039,8 +3039,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3093,8 +3093,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_big_notify: SELECT pg_notify('stats_test_use',
@@ -3133,8 +3133,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_begin: BEGIN;
@@ -3176,8 +3176,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3243,8 +3243,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3310,8 +3310,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3377,8 +3377,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3450,8 +3450,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3523,8 +3523,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3596,8 +3596,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3653,8 +3653,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
diff --git a/src/test/isolation/expected/stats_1.out b/src/test/isolation/expected/stats_1.out
index 3854320106..6b965bb955 100644
--- a/src/test/isolation/expected/stats_1.out
+++ b/src/test/isolation/expected/stats_1.out
@@ -3063,8 +3063,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3117,8 +3117,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_big_notify: SELECT pg_notify('stats_test_use',
@@ -3157,8 +3157,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_begin: BEGIN;
@@ -3200,8 +3200,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3267,8 +3267,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3334,8 +3334,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3401,8 +3401,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3474,8 +3474,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3547,8 +3547,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3620,8 +3620,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3677,8 +3677,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
index a7daf2a49a..1d98ac785b 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -107,8 +107,8 @@ step s1_table_stats {
 
 # SLRU stats steps
 step s1_slru_save_stats {
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 }
 step s1_listen { LISTEN stats_test_nothing; }
 step s1_big_notify { SELECT pg_notify('stats_test_use',
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 346e10a3d2..6e08898b18 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -866,21 +866,21 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
 -- Test that resetting stats works for reset timestamp
 -----
 -- Test that reset_slru with a specified SLRU works.
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
-SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'Notify' \gset
-SELECT pg_stat_reset_slru('CommitTs');
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'notify' \gset
+SELECT pg_stat_reset_slru('commit_timestamp');
  pg_stat_reset_slru 
 --------------------
  
 (1 row)
 
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
 -- Test that multiple SLRUs are reset when no specific SLRU provided to reset function
 SELECT pg_stat_reset_slru();
  pg_stat_reset_slru 
@@ -888,13 +888,13 @@ SELECT pg_stat_reset_slru();
  
 (1 row)
 
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'Notify';
+SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'notify';
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index e3b4ca96e8..d8ac0d06f4 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -447,16 +447,16 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
 -----
 
 -- Test that reset_slru with a specified SLRU works.
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
-SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'Notify' \gset
-SELECT pg_stat_reset_slru('CommitTs');
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'notify' \gset
+SELECT pg_stat_reset_slru('commit_timestamp');
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
 
 -- Test that multiple SLRUs are reset when no specific SLRU provided to reset function
 SELECT pg_stat_reset_slru();
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
-SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'Notify';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'notify';
 
 -- Test that reset_shared with archiver specified as the stats type works
 SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
-- 
2.39.2

#116Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#115)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 27 Feb 2024, at 21:03, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-27, Dilip Kumar wrote:

static const char *const slru_names[] = {
"commit_timestamp",
"multixact_members",
"multixact_offsets",
"notify",
"serializable",
"subtransaction",
"transaction",
"other" /* has to be last
*/
};

Here's a patch for the renaming part.

Sorry for the late reply, I have one nit. Are you sure that multixact_members and multixact_offsets are plural, while transaction and commit_timestamp are singular?
Maybe multixact_members and multixact_offset? Because there are many members and one offset for a givent multixact? Users certainly do not care, thought...

Best regards, Andrey Borodin.

#117Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrey M. Borodin (#116)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-27, Andrey M. Borodin wrote:

Sorry for the late reply, I have one nit. Are you sure that
multixact_members and multixact_offsets are plural, while transaction
and commit_timestamp are singular?
Maybe multixact_members and multixact_offset? Because there are many
members and one offset for a givent multixact? Users certainly do not
care, thought...

I made myself the same question actually, and thought about putting them
both in the singular. I only backed off because I noticed that the
directories themselves are in plural (an old mistake of mine, evidently).
Maybe we should follow that instinct and use the singular for these.

If we do that, we can rename the directories to also appear in singular
when/if the patch to add standard page headers to the SLRUs lands --
which is going to need code to rewrite the files during pg_upgrade
anyway, so the rename is not going to be a big deal.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Crear es tan difícil como ser libre" (Elsa Triolet)

#118Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#117)
2 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Here's the complete set, with these two names using the singular.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

Attachments:

v21-0001-Rename-SLRU-elements-in-pg_stat_slru.patchtext/x-diff; charset=utf-8Download
From 225b2403f7bb9990656d18c8339c452fcd6822c5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 27 Feb 2024 16:56:00 +0100
Subject: [PATCH v21 1/2] Rename SLRU elements in pg_stat_slru

The new names are intended to match an upcoming patch that adds a few
GUCs to configure the SLRU buffer sizes.

Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
---
 doc/src/sgml/monitoring.sgml            | 14 ++++----
 src/backend/access/transam/clog.c       |  2 +-
 src/backend/access/transam/commit_ts.c  |  2 +-
 src/backend/access/transam/multixact.c  |  4 +--
 src/backend/access/transam/subtrans.c   |  2 +-
 src/backend/commands/async.c            |  2 +-
 src/backend/storage/lmgr/predicate.c    |  2 +-
 src/include/utils/pgstat_internal.h     | 14 ++++----
 src/test/isolation/expected/stats.out   | 44 ++++++++++++-------------
 src/test/isolation/expected/stats_1.out | 44 ++++++++++++-------------
 src/test/isolation/specs/stats.spec     |  4 +--
 src/test/regress/expected/stats.out     | 14 ++++----
 src/test/regress/sql/stats.sql          | 14 ++++----
 13 files changed, 81 insertions(+), 81 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf9363ac8..9d73d8c1bb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4853,13 +4853,13 @@ description | Waiting for a newly initialized WAL file to reach durable storage
         <literal>NULL</literal> or is not specified, all the counters shown in
         the <structname>pg_stat_slru</structname> view for all SLRU caches are
         reset. The argument can be one of
-        <literal>CommitTs</literal>,
-        <literal>MultiXactMember</literal>,
-        <literal>MultiXactOffset</literal>,
-        <literal>Notify</literal>,
-        <literal>Serial</literal>,
-        <literal>Subtrans</literal>, or
-        <literal>Xact</literal>
+        <literal>commit_timestamp</literal>,
+        <literal>multixact_member</literal>,
+        <literal>multixact_offset</literal>,
+        <literal>notify</literal>,
+        <literal>serializable</literal>,
+        <literal>subtransaction</literal>, or
+        <literal>transaction</literal>
         to reset the counters for only that entry.
         If the argument is <literal>other</literal> (or indeed, any
         unrecognized name), then the counters for all other SLRU caches, such
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 97f7434da3..34f079cbb1 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -706,7 +706,7 @@ void
 CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(XactCtl, "transaction", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
 				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
 				  SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 6bfe60343e..d965db89c7 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -529,7 +529,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0,
 				  CommitTsSLRULock, "pg_commit_ts",
 				  LWTRANCHE_COMMITTS_BUFFER,
 				  SYNC_HANDLER_COMMIT_TS,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index febc429f72..64040d330e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1851,14 +1851,14 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
+				  "multixact_offset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
 				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
 				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
+				  "multixact_member", NUM_MULTIXACTMEMBER_BUFFERS, 0,
 				  MultiXactMemberSLRULock, "pg_multixact/members",
 				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index b2ed82ac56..6aa47af43e 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -200,7 +200,7 @@ void
 SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "Subtrans", NUM_SUBTRANS_BUFFERS, 0,
+	SimpleLruInit(SubTransCtl, "subtransaction", NUM_SUBTRANS_BUFFERS, 0,
 				  SubtransSLRULock, "pg_subtrans",
 				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8b24b22293..490c84dc19 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -541,7 +541,7 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "Notify", NUM_NOTIFY_BUFFERS, 0,
+	SimpleLruInit(NotifyCtl, "notify", NUM_NOTIFY_BUFFERS, 0,
 				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
 				  SYNC_HANDLER_NONE, true);
 
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index d62060d58c..09e11680fc 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -812,7 +812,7 @@ SerialInit(void)
 	 * Set up SLRU management of the pg_serial data.
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
-	SimpleLruInit(SerialSlruCtl, "Serial",
+	SimpleLruInit(SerialSlruCtl, "serializable",
 				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
 				  false);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 0cb8a58cba..dbbca31602 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -269,13 +269,13 @@ typedef struct PgStat_KindInfo
  * definitions.
  */
 static const char *const slru_names[] = {
-	"CommitTs",
-	"MultiXactMember",
-	"MultiXactOffset",
-	"Notify",
-	"Serial",
-	"Subtrans",
-	"Xact",
+	"commit_timestamp",
+	"multixact_member",
+	"multixact_offset",
+	"notify",
+	"serializable",
+	"subtransaction",
+	"transaction",
 	"other"						/* has to be last */
 };
 
diff --git a/src/test/isolation/expected/stats.out b/src/test/isolation/expected/stats.out
index 61b5a710ec..8c7fe60217 100644
--- a/src/test/isolation/expected/stats.out
+++ b/src/test/isolation/expected/stats.out
@@ -3039,8 +3039,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3093,8 +3093,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_big_notify: SELECT pg_notify('stats_test_use',
@@ -3133,8 +3133,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_begin: BEGIN;
@@ -3176,8 +3176,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3243,8 +3243,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3310,8 +3310,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3377,8 +3377,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3450,8 +3450,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3523,8 +3523,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3596,8 +3596,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3653,8 +3653,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
diff --git a/src/test/isolation/expected/stats_1.out b/src/test/isolation/expected/stats_1.out
index 3854320106..6b965bb955 100644
--- a/src/test/isolation/expected/stats_1.out
+++ b/src/test/isolation/expected/stats_1.out
@@ -3063,8 +3063,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3117,8 +3117,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_big_notify: SELECT pg_notify('stats_test_use',
@@ -3157,8 +3157,8 @@ pg_stat_force_next_flush
 (1 row)
 
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s2_begin: BEGIN;
@@ -3200,8 +3200,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3267,8 +3267,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3334,8 +3334,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3401,8 +3401,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_none: SET stats_fetch_consistency = 'none';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3474,8 +3474,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_cache: SET stats_fetch_consistency = 'cache';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3547,8 +3547,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3620,8 +3620,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
@@ -3677,8 +3677,8 @@ pg_stat_force_next_flush
 
 step s1_fetch_consistency_snapshot: SET stats_fetch_consistency = 'snapshot';
 step s1_slru_save_stats: 
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 
 step s1_listen: LISTEN stats_test_nothing;
 step s1_begin: BEGIN;
diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
index a7daf2a49a..1d98ac785b 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -107,8 +107,8 @@ step s1_table_stats {
 
 # SLRU stats steps
 step s1_slru_save_stats {
-	INSERT INTO test_slru_stats VALUES('Notify', 'blks_zeroed',
-    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'Notify'));
+	INSERT INTO test_slru_stats VALUES('notify', 'blks_zeroed',
+    (SELECT blks_zeroed FROM pg_stat_slru WHERE name = 'notify'));
 }
 step s1_listen { LISTEN stats_test_nothing; }
 step s1_big_notify { SELECT pg_notify('stats_test_use',
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 346e10a3d2..6e08898b18 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -866,21 +866,21 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
 -- Test that resetting stats works for reset timestamp
 -----
 -- Test that reset_slru with a specified SLRU works.
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
-SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'Notify' \gset
-SELECT pg_stat_reset_slru('CommitTs');
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'notify' \gset
+SELECT pg_stat_reset_slru('commit_timestamp');
  pg_stat_reset_slru 
 --------------------
  
 (1 row)
 
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
 -- Test that multiple SLRUs are reset when no specific SLRU provided to reset function
 SELECT pg_stat_reset_slru();
  pg_stat_reset_slru 
@@ -888,13 +888,13 @@ SELECT pg_stat_reset_slru();
  
 (1 row)
 
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'Notify';
+SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'notify';
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index e3b4ca96e8..d8ac0d06f4 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -447,16 +447,16 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
 -----
 
 -- Test that reset_slru with a specified SLRU works.
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
-SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'Notify' \gset
-SELECT pg_stat_reset_slru('CommitTs');
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
-SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'CommitTs' \gset
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'notify' \gset
+SELECT pg_stat_reset_slru('commit_timestamp');
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
 
 -- Test that multiple SLRUs are reset when no specific SLRU provided to reset function
 SELECT pg_stat_reset_slru();
-SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'CommitTs';
-SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'Notify';
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'notify';
 
 -- Test that reset_shared with archiver specified as the stats type works
 SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
-- 
2.39.2

v21-0002-Make-SLRU-buffer-sizes-configurable.patchtext/x-diff; charset=utf-8Download
From 58f0e91166145b659310471bf359931b9ee0720e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 22 Feb 2024 18:42:56 +0100
Subject: [PATCH v21 2/2] Make SLRU buffer sizes configurable

Also, divide the slot array in banks, so that the LRU algorithm can be
made more scalable.

Also remove the centralized control lock for even better scalability.

Authors: Dilip Kumar, Andrey Borodin
---
 doc/src/sgml/config.sgml                      | 139 +++++++
 doc/src/sgml/monitoring.sgml                  |  14 +-
 src/backend/access/transam/clog.c             | 236 ++++++++----
 src/backend/access/transam/commit_ts.c        |  81 ++--
 src/backend/access/transam/multixact.c        | 190 +++++++---
 src/backend/access/transam/slru.c             | 357 +++++++++++++-----
 src/backend/access/transam/subtrans.c         | 103 ++++-
 src/backend/commands/async.c                  |  61 ++-
 src/backend/storage/lmgr/lwlock.c             |   9 +-
 src/backend/storage/lmgr/lwlocknames.txt      |  14 +-
 src/backend/storage/lmgr/predicate.c          |  34 +-
 .../utils/activity/wait_event_names.txt       |  15 +-
 src/backend/utils/init/globals.c              |   9 +
 src/backend/utils/misc/guc_tables.c           |  78 ++++
 src/backend/utils/misc/postgresql.conf.sample |   9 +
 src/include/access/clog.h                     |   1 -
 src/include/access/commit_ts.h                |   1 -
 src/include/access/multixact.h                |   4 -
 src/include/access/slru.h                     |  86 +++--
 src/include/access/subtrans.h                 |   3 -
 src/include/commands/async.h                  |   5 -
 src/include/miscadmin.h                       |   8 +
 src/include/storage/lwlock.h                  |   7 +
 src/include/storage/predicate.h               |   4 -
 src/include/utils/guc_hooks.h                 |  11 +
 src/test/modules/test_slru/test_slru.c        |  35 +-
 26 files changed, 1161 insertions(+), 353 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 36a2a5ce43..43b1a132a2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
+      <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>commit_timestamp_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of memory to use to cache the contents of
+        <literal>pg_commit_ts</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-member-buffers" xreflabel="multixact_member_buffers">
+      <term><varname>multixact_member_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_member_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/members</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-multixact-offset-buffers" xreflabel="multixact_offset_buffers">
+      <term><varname>multixact_offset_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>multixact_offset_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_multixact/offsets</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-notify-buffers" xreflabel="notify_buffers">
+      <term><varname>notify_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>notify_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_notify</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>16</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-serializable-buffers" xreflabel="serializable_buffers">
+      <term><varname>serializable_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>serializable_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_serial</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>32</literal>.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-subtransaction-buffers" xreflabel="subtransaction_buffers">
+      <term><varname>subtransaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>subtransaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_subtrans</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="guc-transaction-buffers" xreflabel="transaction_buffers">
+      <term><varname>transaction_buffers</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>transaction_buffers</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of shared memory to use to cache the contents
+        of <literal>pg_xact</literal> (see
+        <xref linkend="pgdata-contents-table"/>).
+        If this value is specified without units, it is taken as blocks,
+        that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
+        The default value is <literal>0</literal>, which requests
+        <varname>shared_buffers</varname>/512 up to 1024 blocks,
+        but not fewer than 16 blocks.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-max-stack-depth" xreflabel="max_stack_depth">
       <term><varname>max_stack_depth</varname> (<type>integer</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9d73d8c1bb..a2fbaede2a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4482,12 +4482,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
   <para>
    <productname>PostgreSQL</productname> accesses certain on-disk information
-   via <firstterm>SLRU</firstterm> (simple least-recently-used) caches.
+   via <literal>SLRU</literal> (<firstterm>simple least-recently-used</firstterm>)
+   caches.
    The <structname>pg_stat_slru</structname> view will contain
    one row for each tracked SLRU cache, showing statistics about access
    to cached pages.
   </para>
 
+  <para>
+   For each <literal>SLRU</literal> area that's part of the core server,
+   there is a configuration parameter that controls its size, with the suffix
+   <literal>_buffers</literal> appended.  For historical
+   reasons, the names are not exact matches, but <literal>Xact</literal>
+   corresponds to <literal>transaction_buffers</literal> and the rest should
+   be obvious.
+   <!-- Should we edit pgstat_internal.h::slru_names so that the "name" matches
+        the GUC name?? -->
+  </para>
+
   <table id="pg-stat-slru-view" xreflabel="pg_stat_slru">
    <title><structname>pg_stat_slru</structname> View</title>
    <tgroup cols="1">
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 34f079cbb1..6fce0b5ffa 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -3,12 +3,13 @@
  * clog.c
  *		PostgreSQL transaction-commit-log manager
  *
- * This module replaces the old "pg_log" access code, which treated pg_log
- * essentially like a relation, in that it went through the regular buffer
- * manager.  The problem with that was that there wasn't any good way to
- * recycle storage space for transactions so old that they'll never be
- * looked up again.  Now we use specialized access code so that the commit
- * log can be broken into relatively small, independent segments.
+ * This module stores two bits per transaction regarding its commit/abort
+ * status; the status for four transactions fit in a byte.
+ *
+ * This would be a pretty simple abstraction on top of slru.c, except that
+ * for performance reasons we allow multiple transactions that are
+ * committing concurrently to form a queue, so that a single process can
+ * update the status for all of them within a single lock acquisition run.
  *
  * XLOG interactions: this module generates an XLOG record whenever a new
  * CLOG page is initialized to zeroes.  Other writes of CLOG come from
@@ -43,6 +44,7 @@
 #include "pgstat.h"
 #include "storage/proc.h"
 #include "storage/sync.h"
+#include "utils/guc_hooks.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -62,6 +64,15 @@
 #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
+/*
+ * Because space used in CLOG by each transaction is so small, we place a
+ * smaller limit on the number of CLOG buffers than SLRU allows.  No other
+ * SLRU needs this.
+ */
+#define CLOG_MAX_ALLOWED_BUFFERS \
+	Min(SLRU_MAX_ALLOWED_BUFFERS, \
+		(((MaxTransactionId / 2) + (CLOG_XACTS_PER_PAGE - 1)) / CLOG_XACTS_PER_PAGE))
+
 
 /*
  * Although we return an int64 the actual value can't currently exceed
@@ -284,15 +295,20 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 						   XLogRecPtr lsn, int64 pageno,
 						   bool all_xact_same_page)
 {
+	LWLock	   *lock;
+
 	/* Can't use group update when PGPROC overflows. */
 	StaticAssertDecl(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
 					 "group clog threshold less than PGPROC cached subxids");
 
+	/* Get the SLRU bank lock for the page we are going to access. */
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
+
 	/*
-	 * When there is contention on XactSLRULock, we try to group multiple
-	 * updates; a single leader process will perform transaction status
-	 * updates for multiple backends so that the number of times XactSLRULock
-	 * needs to be acquired is reduced.
+	 * When there is contention on the SLRU bank lock we need, we try to group
+	 * multiple updates; a single leader process will perform transaction
+	 * status updates for multiple backends so that the number of times the
+	 * bank lock needs to be acquired is reduced.
 	 *
 	 * For this optimization to be safe, the XID and subxids in MyProc must be
 	 * the same as the ones for which we're setting the status.  Check that
@@ -310,17 +326,17 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 				nsubxids * sizeof(TransactionId)) == 0))
 	{
 		/*
-		 * If we can immediately acquire XactSLRULock, we update the status of
-		 * our own XID and release the lock.  If not, try use group XID
-		 * update.  If that doesn't work out, fall back to waiting for the
-		 * lock to perform an update for this transaction only.
+		 * If we can immediately acquire the lock, we update the status of our
+		 * own XID and release the lock.  If not, try use group XID update. If
+		 * that doesn't work out, fall back to waiting for the lock to perform
+		 * an update for this transaction only.
 		 */
-		if (LWLockConditionalAcquire(XactSLRULock, LW_EXCLUSIVE))
+		if (LWLockConditionalAcquire(lock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
 			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 											   lsn, pageno);
-			LWLockRelease(XactSLRULock);
+			LWLockRelease(lock);
 			return;
 		}
 		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
@@ -333,10 +349,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	}
 
 	/* Group update not applicable, or couldn't accept this page number. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
 									   lsn, pageno);
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -355,7 +371,8 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	Assert(status == TRANSACTION_STATUS_COMMITTED ||
 		   status == TRANSACTION_STATUS_ABORTED ||
 		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(XactSLRULock, LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl, pageno),
+								LW_EXCLUSIVE));
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -406,14 +423,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 }
 
 /*
- * When we cannot immediately acquire XactSLRULock in exclusive mode at
+ * Subroutine for TransactionIdSetPageStatus, q.v.
+ *
+ * When we cannot immediately acquire the SLRU bank lock in exclusive mode at
  * commit time, add ourselves to a list of processes that need their XIDs
  * status update.  The first process to add itself to the list will acquire
- * XactSLRULock in exclusive mode and set transaction status as required
- * on behalf of all group members.  This avoids a great deal of contention
- * around XactSLRULock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
+ * the lock in exclusive mode and set transaction status as required on behalf
+ * of all group members.  This avoids a great deal of contention when many
+ * processes are trying to commit at once, since the lock need not be
+ * repeatedly handed off from one committing process to the next.
  *
  * Returns true when transaction status has been updated in clog; returns
  * false if we decided against applying the optimization because the page
@@ -425,16 +443,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 {
 	volatile PROC_HDR *procglobal = ProcGlobal;
 	PGPROC	   *proc = MyProc;
-	int			pgprocno = MyProcNumber;
 	uint32		nextidx;
 	uint32		wakeidx;
+	int			prevpageno;
+	LWLock	   *prevlock = NULL;
 
 	/* We should definitely have an XID whose status needs to be updated. */
 	Assert(TransactionIdIsValid(xid));
 
 	/*
-	 * Add ourselves to the list of processes needing a group XID status
-	 * update.
+	 * Prepare to add ourselves to the list of processes needing a group XID
+	 * status update.
 	 */
 	proc->clogGroupMember = true;
 	proc->clogGroupMemberXid = xid;
@@ -442,6 +461,29 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
 
+	/*
+	 * We put ourselves in the queue by writing MyProcNumber to
+	 * ProcGlobal->clogGroupFirst.  However, if there's already a process
+	 * listed there, we compare our pageno with that of that process; if it
+	 * differs, we cannot participate in the group, so we return for caller to
+	 * update pg_xact in the normal way.
+	 *
+	 * If we're not the first process in the list, we must follow the leader.
+	 * We do this by storing the data we want updated in our PGPROC entry
+	 * where the leader can find it, then going to sleep.
+	 *
+	 * If no process is already in the list, we're the leader; our first step
+	 * is to lock the SLRU bank to which our page belongs, then we close out
+	 * the group by resetting the list pointer from ProcGlobal->clogGroupFirst
+	 * (this lets other processes set up other groups later); finally we do
+	 * the SLRU updates, release the SLRU bank lock, and wake up the sleeping
+	 * processes.
+	 *
+	 * If another group starts to update a page in a different SLRU bank, they
+	 * can proceed concurrently, since the bank lock they're going to use is
+	 * different from ours.  If another group starts to update a page in the
+	 * same bank as ours, they wait until we release the lock.
+	 */
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
 	while (true)
@@ -453,10 +495,11 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		 * There is a race condition here, which is that after doing the below
 		 * check and before adding this proc's clog update to a group, the
 		 * group leader might have already finished the group update for this
-		 * page and becomes group leader of another group. This will lead to a
-		 * situation where a single group can have different clog page
-		 * updates.  This isn't likely and will still work, just maybe a bit
-		 * less efficiently.
+		 * page and becomes group leader of another group, updating a
+		 * different page.  This will lead to a situation where a single group
+		 * can have different clog page updates.  This isn't likely and will
+		 * still work, just less efficiently -- we handle this case by
+		 * switching to a different bank lock in the loop below.
 		 */
 		if (nextidx != INVALID_PGPROCNO &&
 			GetPGProcByNumber(nextidx)->clogGroupMemberPage != proc->clogGroupMemberPage)
@@ -474,7 +517,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 
 		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
 										   &nextidx,
-										   (uint32) pgprocno))
+										   (uint32) MyProcNumber))
 			break;
 	}
 
@@ -508,13 +551,21 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 		return true;
 	}
 
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	/*
+	 * By here, we know we're the leader process.  Acquire the SLRU bank lock
+	 * that corresponds to the page we originally wanted to modify.
+	 */
+	prevpageno = ProcGlobal->allProcs[MyProcNumber].clogGroupMemberPage;
+	prevlock = SimpleLruGetBankLock(XactCtl, prevpageno);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 
 	/*
 	 * Now that we've got the lock, clear the list of processes waiting for
 	 * group XID status update, saving a pointer to the head of the list.
-	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 * (Trying to pop elements one at a time could lead to an ABA problem.)
+	 *
+	 * At this point, any processes trying to do this would create a separate
+	 * group.
 	 */
 	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
 									 INVALID_PGPROCNO);
@@ -526,6 +577,31 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *nextproc = &ProcGlobal->allProcs[nextidx];
+		int			thispageno = nextproc->clogGroupMemberPage;
+
+		/*
+		 * If the page to update belongs to a different bank than the previous
+		 * one, exchange bank lock to the new one.  This should be quite rare,
+		 * as described above.
+		 *
+		 * (We could try to optimize this by waking up the processes for which
+		 * we have already updated the status while we exchange the lock, but
+		 * the code doesn't do that at present.  I think it'd require
+		 * additional bookkeeping, making the common path slower in order to
+		 * improve an infrequent case.)
+		 */
+		if (thispageno != prevpageno)
+		{
+			LWLock	   *lock = SimpleLruGetBankLock(XactCtl, thispageno);
+
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+			}
+			prevlock = lock;
+			prevpageno = thispageno;
+		}
 
 		/*
 		 * Transactions with more than THRESHOLD_SUBTRANS_CLOG_OPT sub-XIDs
@@ -545,12 +621,17 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	}
 
 	/* We're done with the lock now. */
-	LWLockRelease(XactSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 
 	/*
 	 * Now that we've released the lock, go back and wake everybody up.  We
 	 * don't do this under the lock so as to keep lock hold times to a
 	 * minimum.
+	 *
+	 * (Perhaps we could do this in two passes, the first setting
+	 * clogGroupNext to invalid while saving the semaphores to an array, then
+	 * a single write barrier, then another pass unlocking the semaphores.)
 	 */
 	while (wakeidx != INVALID_PGPROCNO)
 	{
@@ -574,7 +655,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 /*
  * Sets the commit status of a single transaction.
  *
- * Must be called with XactSLRULock held
+ * Caller must hold the corresponding SLRU bank lock, will be held at exit.
  */
 static void
 TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
@@ -585,6 +666,11 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	char		byteval;
 	char		curval;
 
+	Assert(XactCtl->shared->page_number[slotno] == TransactionIdToPage(xid));
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(XactCtl,
+													 XactCtl->shared->page_number[slotno]),
+								LW_EXCLUSIVE));
+
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
@@ -666,7 +752,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	lsnindex = GetLSNIndex(slotno, xid);
 	*lsn = XactCtl->shared->group_lsn[lsnindex];
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno));
 
 	return status;
 }
@@ -674,23 +760,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 /*
  * Number of shared CLOG buffers.
  *
- * On larger multi-processor systems, it is possible to have many CLOG page
- * requests in flight at one time which could lead to disk access for CLOG
- * page if the required page is not found in memory.  Testing revealed that we
- * can get the best performance by having 128 CLOG buffers, more than that it
- * doesn't improve performance.
- *
- * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
- * a good idea, because it would increase the minimum amount of shared memory
- * required to start, which could be a problem for people running very small
- * configurations.  The following formula seems to represent a reasonable
- * compromise: people with very low values for shared_buffers will get fewer
- * CLOG buffers as well, and everyone else will get 128.
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
  */
-Size
+static int
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	/* auto-tune based on shared buffers */
+	if (transaction_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, transaction_buffers), CLOG_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -705,13 +786,36 @@ CLOGShmemSize(void)
 void
 CLOGShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (transaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CLOGShmemBuffers());
+		SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (transaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("transaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(transaction_buffers != 0);
+
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "transaction", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-				  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  SYNC_HANDLER_CLOG, false);
+				  "pg_xact", LWTRANCHE_XACT_BUFFER,
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for transaction_buffers
+ */
+bool
+check_transaction_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("transaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial CLOG segment.  (The CLOG directory is assumed to
@@ -722,8 +826,9 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the commit log */
 	slotno = ZeroCLOGPage(0, false);
@@ -732,7 +837,7 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -781,8 +886,9 @@ TrimCLOG(void)
 {
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
+	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current clog page.  Under normal
@@ -814,7 +920,7 @@ TrimCLOG(void)
 		XactCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -846,6 +952,7 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -856,13 +963,14 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
+	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCLOGPage(pageno, true);
 
-	LWLockRelease(XactSLRULock);
+	LWLockRelease(lock);
 }
 
 
@@ -1000,16 +1108,18 @@ clog_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(XactCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCLOGPage(pageno, false);
 		SimpleLruWritePage(XactCtl, slotno);
 		Assert(!XactCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(XactSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == CLOG_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index d965db89c7..b398ae8b4c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -33,6 +33,7 @@
 #include "pg_trace.h"
 #include "storage/shmem.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 #include "utils/timestamp.h"
 
@@ -225,10 +226,11 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 					 TransactionId *subxids, TimestampTz ts,
 					 RepOriginId nodeid, int64 pageno)
 {
+	LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 	int			slotno;
 	int			i;
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
 
@@ -238,13 +240,13 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
 
 	CommitTsCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
  * Sets the commit timestamp of a single transaction.
  *
- * Must be called with CommitTsSLRULock held
+ * Caller must hold the correct SLRU bank lock, will be held at exit
  */
 static void
 TransactionIdSetCommitTs(TransactionId xid, TimestampTz ts,
@@ -345,7 +347,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	if (nodeid)
 		*nodeid = entry.nodeid;
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(CommitTsCtl, pageno));
 	return *ts != 0;
 }
 
@@ -499,14 +501,18 @@ pg_xact_commit_timestamp_origin(PG_FUNCTION_ARGS)
 /*
  * Number of shared CommitTS buffers.
  *
- * We use a very similar logic as for the number of CLOG buffers (except we
- * scale up twice as fast with shared buffers, and the maximum is twice as
- * high); see comments in CLOGShmemBuffers.
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
  */
-Size
+static int
 CommitTsShmemBuffers(void)
 {
-	return Min(256, Max(4, NBuffers / 256));
+	/* auto-tune based on shared buffers */
+	if (commit_timestamp_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, commit_timestamp_buffers), SLRU_MAX_ALLOWED_BUFFERS);
 }
 
 /*
@@ -528,10 +534,24 @@ CommitTsShmemInit(void)
 {
 	bool		found;
 
+	/* If auto-tuning is requested, now is the time to do it */
+	if (commit_timestamp_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", CommitTsShmemBuffers());
+		SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (commit_timestamp_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("commit_timestamp_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(commit_timestamp_buffers != 0);
+
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0,
-				  CommitTsSLRULock, "pg_commit_ts",
-				  LWTRANCHE_COMMITTS_BUFFER,
+				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
+				  LWTRANCHE_COMMITTS_SLRU,
 				  SYNC_HANDLER_COMMIT_TS,
 				  false);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
@@ -553,6 +573,15 @@ CommitTsShmemInit(void)
 		Assert(found);
 }
 
+/*
+ * GUC check_hook for commit_timestamp_buffers
+ */
+bool
+check_commit_ts_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("commit_timestamp_buffers", newval);
+}
+
 /*
  * This function must be called ONCE on system install.
  *
@@ -715,13 +744,14 @@ ActivateCommitTs(void)
 	/* Create the current segment file, if necessary */
 	if (!SimpleLruDoesPhysicalPageExist(CommitTsCtl, pageno))
 	{
+		LWLock	   *lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
 		int			slotno;
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 
 	/* Change the activation status in shared memory. */
@@ -760,8 +790,6 @@ DeactivateCommitTs(void)
 	TransamVariables->oldestCommitTsXid = InvalidTransactionId;
 	TransamVariables->newestCommitTsXid = InvalidTransactionId;
 
-	LWLockRelease(CommitTsLock);
-
 	/*
 	 * Remove *all* files.  This is necessary so that there are no leftover
 	 * files; in the case where this feature is later enabled after running
@@ -769,10 +797,16 @@ DeactivateCommitTs(void)
 	 * (We can probably tolerate out-of-sequence files, as they are going to
 	 * be overwritten anyway when we wrap around, but it seems better to be
 	 * tidy.)
+	 *
+	 * Note that we do this with CommitTsLock acquired in exclusive mode. This
+	 * is very heavy-handed, but since this routine can only be called in the
+	 * replica and should happen very rarely, we don't worry too much about
+	 * it.  Note also that no process should be consulting this SLRU if we
+	 * have just deactivated it.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
 	(void) SlruScanDirectory(CommitTsCtl, SlruScanDirCbDeleteAll, NULL);
-	LWLockRelease(CommitTsSLRULock);
+
+	LWLockRelease(CommitTsLock);
 }
 
 /*
@@ -804,6 +838,7 @@ void
 ExtendCommitTs(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * Nothing to do if module not enabled.  Note we do an unlocked read of
@@ -824,12 +859,14 @@ ExtendCommitTs(TransactionId newestXact)
 
 	pageno = TransactionIdToCTsPage(newestXact);
 
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroCommitTsPage(pageno, !InRecovery);
 
-	LWLockRelease(CommitTsSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -983,16 +1020,18 @@ commit_ts_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(CommitTsCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroCommitTsPage(pageno, false);
 		SimpleLruWritePage(CommitTsCtl, slotno);
 		Assert(!CommitTsCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(CommitTsSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == COMMIT_TS_TRUNCATE)
 	{
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 64040d330e..9b81506145 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -88,6 +88,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -192,10 +193,10 @@ static SlruCtlData MultiXactMemberCtlData;
 
 /*
  * MultiXact state shared across all backends.  All this state is protected
- * by MultiXactGenLock.  (We also use MultiXactOffsetSLRULock and
- * MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
- * buffers.  For concurrency's sake, we avoid holding more than one of these
- * locks at a time.)
+ * by MultiXactGenLock.  (We also use SLRU bank's lock of MultiXactOffset and
+ * MultiXactMember to guard accesses to the two sets of SLRU buffers.  For
+ * concurrency's sake, we avoid holding more than one of these locks at a
+ * time.)
  */
 typedef struct MultiXactStateData
 {
@@ -870,12 +871,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	int			slotno;
 	MultiXactOffset *offptr;
 	int			i;
-
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
+
 	/*
 	 * Note: we pass the MultiXactId to SimpleLruReadPage as the "transaction"
 	 * to complain about if there's any I/O error.  This is kinda bogus, but
@@ -891,10 +895,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 
-	/* Exchange our lock */
-	LWLockRelease(MultiXactOffsetSLRULock);
-
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	/* Release MultiXactOffset SLRU lock. */
+	LWLockRelease(lock);
 
 	prev_pageno = -1;
 
@@ -916,6 +918,20 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * MultiXactMember SLRU page is changed so check if this new page
+			 * fall into the different SLRU bank then release the old bank's
+			 * lock and acquire lock on the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock != NULL)
+					LWLockRelease(prevlock);
+
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -936,7 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock != NULL)
+		LWLockRelease(prevlock);
 }
 
 /*
@@ -1239,6 +1256,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
+	LWLock	   *lock;
+	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1342,11 +1361,22 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
+	/*
+	 * If this page falls under a different bank, release the old bank's lock
+	 * and acquire the lock of the new bank.
+	 */
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+	if (lock != prevlock)
+	{
+		if (prevlock != NULL)
+			LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+		prevlock = lock;
+	}
+
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
@@ -1379,7 +1409,21 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
+		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (prevlock != lock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+		}
 
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -1388,7 +1432,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(MultiXactOffsetSLRULock);
+			LWLockRelease(prevlock);
+			prevlock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1397,13 +1442,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(prevlock);
+	prevlock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
-	/* Now get the members themselves. */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	truelength = 0;
 	prev_pageno = -1;
 	for (i = 0; i < length; i++, offset++)
@@ -1419,6 +1462,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			/*
+			 * Since we're going to access a different SLRU page, if this page
+			 * falls under a different bank, release the old bank's lock and
+			 * acquire the lock of the new bank.
+			 */
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (lock != prevlock)
+			{
+				if (prevlock)
+					LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
 			prev_pageno = pageno;
 		}
@@ -1442,7 +1499,8 @@ retry:
 		truelength++;
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	if (prevlock)
+		LWLockRelease(prevlock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
@@ -1834,8 +1892,8 @@ MultiXactShmemSize(void)
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTOFFSET_BUFFERS, 0));
-	size = add_size(size, SimpleLruShmemSize(NUM_MULTIXACTMEMBER_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_offset_buffers, 0));
+	size = add_size(size, SimpleLruShmemSize(multixact_member_buffers, 0));
 
 	return size;
 }
@@ -1851,16 +1909,16 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-				  "multixact_offset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
-				  MultiXactOffsetSLRULock, "pg_multixact/offsets",
-				  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  "multixact_offset", multixact_offset_buffers, 0,
+				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
 				  SYNC_HANDLER_MULTIXACT_OFFSET,
 				  false);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
-				  "multixact_member", NUM_MULTIXACTMEMBER_BUFFERS, 0,
-				  MultiXactMemberSLRULock, "pg_multixact/members",
-				  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  "multixact_member", multixact_member_buffers, 0,
+				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
 				  SYNC_HANDLER_MULTIXACT_MEMBER,
 				  false);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
@@ -1887,6 +1945,24 @@ MultiXactShmemInit(void)
 	OldestVisibleMXactId = OldestMemberMXactId + MaxOldestSlot;
 }
 
+/*
+ * GUC check_hook for multixact_offset_buffers
+ */
+bool
+check_multixact_offset_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_offset_buffers", newval);
+}
+
+/*
+ * GUC check_hook for multixact_member_buffer
+ */
+bool
+check_multixact_member_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("multixact_member_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates the initial
  * MultiXact segments.  (The MultiXacts directories are assumed to have been
@@ -1896,8 +1972,10 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	LWLock	   *lock;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the offsets log */
 	slotno = ZeroMultiXactOffsetPage(0, false);
@@ -1906,9 +1984,10 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the members log */
 	slotno = ZeroMultiXactMemberPage(0, false);
@@ -1917,7 +1996,7 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(MultiXactMemberSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -1977,10 +2056,12 @@ static void
 MaybeExtendOffsetSlru(void)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
 	{
@@ -1995,7 +2076,7 @@ MaybeExtendOffsetSlru(void)
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2049,6 +2130,8 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
+	/* Clean up offsets state */
+
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
@@ -2056,9 +2139,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
 						pageno);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
 	 * TrimCLOG() for background.  Unlike CLOG, some WAL record covers every
@@ -2072,7 +2152,9 @@ TrimMultiXact(void)
 	{
 		int			slotno;
 		MultiXactOffset *offptr;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
 		offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 		offptr += entryno;
@@ -2080,10 +2162,9 @@ TrimMultiXact(void)
 		MemSet(offptr, 0, BLCKSZ - (entryno * sizeof(MultiXactOffset)));
 
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactOffsetSLRULock);
-
 	/*
 	 * And the same for members.
 	 *
@@ -2093,8 +2174,6 @@ TrimMultiXact(void)
 	pg_atomic_write_u64(&MultiXactMemberCtl->shared->latest_page_number,
 						pageno);
 
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
 	 * TrimCLOG() for motivation.
@@ -2105,7 +2184,9 @@ TrimMultiXact(void)
 		int			slotno;
 		TransactionId *xidptr;
 		int			memberoff;
+		LWLock	   *lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		memberoff = MXOffsetToMemberOffset(offset);
 		slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
 		xidptr = (TransactionId *)
@@ -2120,10 +2201,9 @@ TrimMultiXact(void)
 		 */
 
 		MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+		LWLockRelease(lock);
 	}
 
-	LWLockRelease(MultiXactMemberSLRULock);
-
 	/* signal that we're officially up */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->finishedStartup = true;
@@ -2411,6 +2491,7 @@ static void
 ExtendMultiXactOffset(MultiXactId multi)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first MultiXactId of a page.  But beware: just after
@@ -2421,13 +2502,14 @@ ExtendMultiXactOffset(MultiXactId multi)
 		return;
 
 	pageno = MultiXactIdToOffsetPage(multi);
+	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page and make an XLOG entry about it */
 	ZeroMultiXactOffsetPage(pageno, true);
 
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -2460,15 +2542,17 @@ ExtendMultiXactMember(MultiXactOffset offset, int nmembers)
 		if (flagsoff == 0 && flagsbit == 0)
 		{
 			int64		pageno;
+			LWLock	   *lock;
 
 			pageno = MXOffsetToMemberPage(offset);
+			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
 
-			LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
 
 			/* Zero the page and make an XLOG entry about it */
 			ZeroMultiXactMemberPage(pageno, true);
 
-			LWLockRelease(MultiXactMemberSLRULock);
+			LWLockRelease(lock);
 		}
 
 		/*
@@ -2766,7 +2850,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
-	LWLockRelease(MultiXactOffsetSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(MultiXactOffsetCtl, pageno));
 
 	*result = offset;
 	return true;
@@ -3248,31 +3332,35 @@ multixact_redo(XLogReaderState *record)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactOffsetPage(pageno, false);
 		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactOffsetSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_ZERO_MEM_PAGE)
 	{
 		int64		pageno;
 		int			slotno;
+		LWLock	   *lock;
 
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
 
-		LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
+		lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 
 		slotno = ZeroMultiXactMemberPage(pageno, false);
 		SimpleLruWritePage(MultiXactMemberCtl, slotno);
 		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
-		LWLockRelease(MultiXactMemberSLRULock);
+		LWLockRelease(lock);
 	}
 	else if (info == XLOG_MULTIXACT_CREATE_ID)
 	{
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 93cefcd10d..f774d285b7 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -1,28 +1,38 @@
 /*-------------------------------------------------------------------------
  *
  * slru.c
- *		Simple LRU buffering for transaction status logfiles
+ *		Simple LRU buffering for wrap-around-able permanent metadata
  *
- * We use a simple least-recently-used scheme to manage a pool of page
- * buffers.  Under ordinary circumstances we expect that write
- * traffic will occur mostly to the latest page (and to the just-prior
- * page, soon after a page transition).  Read traffic will probably touch
- * a larger span of pages, but in any case a fairly small number of page
- * buffers should be sufficient.  So, we just search the buffers using plain
- * linear search; there's no need for a hashtable or anything fancy.
- * The management algorithm is straight LRU except that we will never swap
- * out the latest page (since we know it's going to be hit again eventually).
+ * This module is used to maintain various pieces of transaction status
+ * indexed by TransactionId (such as commit status, parent transaction ID,
+ * commit timestamp), as well as storage for multixacts, serializable
+ * isolation locks and NOTIFY traffic.  Extensions can define their own
+ * SLRUs, too.
  *
- * We use a control LWLock to protect the shared data structures, plus
- * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
- * must be held to examine or modify any shared state.  A process that is
- * reading in or writing out a page buffer does not hold the control lock,
- * only the per-buffer lock for the buffer it is working on.  One exception
- * is latest_page_number, which is read and written using atomic ops.
+ * Under ordinary circumstances we expect that write traffic will occur
+ * mostly to the latest page (and to the just-prior page, soon after a
+ * page transition).  Read traffic will probably touch a larger span of
+ * pages, but a relatively small number of buffers should be sufficient.
  *
- * "Holding the control lock" means exclusive lock in all cases except for
- * SimpleLruReadPage_ReadOnly(); see comments for SlruRecentlyUsed() for
- * the implications of that.
+ * We use a simple least-recently-used scheme to manage a pool of shared
+ * page buffers, split in banks by the lowest bits of the page number, and
+ * the management algorithm only processes the bank to which the desired
+ * page belongs, so a linear search is sufficient; there's no need for a
+ * hashtable or anything fancy.  The algorithm is straight LRU except that
+ * we will never swap out the latest page (since we know it's going to be
+ * hit again eventually).
+ *
+ * We use per-bank control LWLocks to protect the shared data structures,
+ * plus per-buffer LWLocks that synchronize I/O for each buffer.  The
+ * bank's control lock must be held to examine or modify any of the bank's
+ * shared state.  A process that is reading in or writing out a page
+ * buffer does not hold the control lock, only the per-buffer lock for the
+ * buffer it is working on.  One exception is latest_page_number, which is
+ * read and written using atomic ops.
+ *
+ * "Holding the bank control lock" means exclusive lock in all cases
+ * except for SimpleLruReadPage_ReadOnly(); see comments for
+ * SlruRecentlyUsed() for the implications of that.
  *
  * When initiating I/O on a buffer, we acquire the per-buffer lock exclusively
  * before releasing the control lock.  The per-buffer lock is released after
@@ -60,6 +70,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/guc_hooks.h"
 
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
@@ -106,6 +117,23 @@ typedef struct SlruWriteAllData
 
 typedef struct SlruWriteAllData *SlruWriteAll;
 
+
+/*
+ * Bank size for the slot array.  Pages are assigned a bank according to their
+ * page number, with each bank being this size.  We want a power of 2 so that
+ * we can determine the bank number for a page with just bit shifting; we also
+ * want to keep the bank size small so that LRU victim search is fast.  16
+ * buffers per bank seems a good number.
+ */
+#define SLRU_BANK_BITSHIFT		4
+#define SLRU_BANK_SIZE			(1 << SLRU_BANK_BITSHIFT)
+
+/*
+ * Macro to get the bank number to which the slot belongs.
+ */
+#define SlotGetBankNumber(slotno)	((slotno) >> SLRU_BANK_BITSHIFT)
+
+
 /*
  * Populate a file tag describing a segment file.  We only use the segment
  * number, since we can derive everything else we need by having separate
@@ -118,34 +146,6 @@ typedef struct SlruWriteAllData *SlruWriteAll;
 	(a).segno = (xx_segno) \
 )
 
-/*
- * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
- * of arguments!
- *
- * The reason for the if-test is that there are often many consecutive
- * accesses to the same page (particularly the latest page).  By suppressing
- * useless increments of cur_lru_count, we reduce the probability that old
- * pages' counts will "wrap around" and make them appear recently used.
- *
- * We allow this code to be executed concurrently by multiple processes within
- * SimpleLruReadPage_ReadOnly().  As long as int reads and writes are atomic,
- * this should not cause any completely-bogus values to enter the computation.
- * However, it is possible for either cur_lru_count or individual
- * page_lru_count entries to be "reset" to lower values than they should have,
- * in case a process is delayed while it executes this macro.  With care in
- * SlruSelectLRUPage(), this does little harm, and in any case the absolute
- * worst possible consequence is a nonoptimal choice of page to evict.  The
- * gain from allowing concurrent reads of SLRU pages seems worth it.
- */
-#define SlruRecentlyUsed(shared, slotno)	\
-	do { \
-		int		new_lru_count = (shared)->cur_lru_count; \
-		if (new_lru_count != (shared)->page_lru_count[slotno]) { \
-			(shared)->cur_lru_count = ++new_lru_count; \
-			(shared)->page_lru_count[slotno] = new_lru_count; \
-		} \
-	} while (0)
-
 /* Saved info for SlruReportIOError */
 typedef enum
 {
@@ -173,6 +173,7 @@ static int	SlruSelectLRUPage(SlruCtl ctl, int64 pageno);
 static bool SlruScanDirCbDeleteCutoff(SlruCtl ctl, char *filename,
 									  int64 segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
+static inline void SlruRecentlyUsed(SlruShared shared, int slotno);
 
 
 /*
@@ -182,8 +183,12 @@ static void SlruInternalDeleteSegment(SlruCtl ctl, int64 segno);
 Size
 SimpleLruShmemSize(int nslots, int nlsns)
 {
+	int			nbanks = nslots / SLRU_BANK_SIZE;
 	Size		sz;
 
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
+	Assert(nslots % SLRU_BANK_SIZE == 0);
+
 	/* we assume nslots isn't so large as to risk overflow */
 	sz = MAXALIGN(sizeof(SlruSharedData));
 	sz += MAXALIGN(nslots * sizeof(char *));	/* page_buffer[] */
@@ -192,6 +197,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	sz += MAXALIGN(nslots * sizeof(int64)); /* page_number[] */
 	sz += MAXALIGN(nslots * sizeof(int));	/* page_lru_count[] */
 	sz += MAXALIGN(nslots * sizeof(LWLockPadded));	/* buffer_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(LWLockPadded));	/* bank_locks[] */
+	sz += MAXALIGN(nbanks * sizeof(int));	/* bank_cur_lru_count[] */
 
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
@@ -199,6 +206,21 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
+/*
+ * Determine a number of SLRU buffers to use.
+ *
+ * We simply divide shared_buffers by the divisor given and cap
+ * that at the maximum given; but always at least SLRU_BANK_SIZE.
+ * Round down to the nearest multiple of SLRU_BANK_SIZE.
+ */
+int
+SimpleLruAutotuneBuffers(int divisor, int max)
+{
+	return Min(max - (max % SLRU_BANK_SIZE),
+			   Max(SLRU_BANK_SIZE,
+				   NBuffers / divisor - (NBuffers / divisor) % SLRU_BANK_SIZE));
+}
+
 /*
  * Initialize, or attach to, a simple LRU cache in shared memory.
  *
@@ -208,16 +230,20 @@ SimpleLruShmemSize(int nslots, int nlsns)
  * nlsns: number of LSN groups per page (set to zero if not relevant).
  * ctllock: LWLock to use to control access to the shared control structure.
  * subdir: PGDATA-relative subdirectory that will contain the files.
- * tranche_id: LWLock tranche ID to use for the SLRU's per-buffer LWLocks.
+ * buffer_tranche_id: tranche ID to use for the SLRU's per-buffer LWLocks.
+ * bank_tranche_id: tranche ID to use for the bank LWLocks.
  * sync_handler: which set of functions to use to handle sync requests
  */
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir, int tranche_id,
+			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
 			  SyncRequestHandler sync_handler, bool long_segment_names)
 {
 	SlruShared	shared;
 	bool		found;
+	int			nbanks = nslots / SLRU_BANK_SIZE;
+
+	Assert(nslots <= SLRU_MAX_ALLOWED_BUFFERS);
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
@@ -233,12 +259,9 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 
 		memset(shared, 0, sizeof(SlruSharedData));
 
-		shared->ControlLock = ctllock;
-
 		shared->num_slots = nslots;
 		shared->lsn_groups_per_page = nlsns;
 
-		shared->cur_lru_count = 0;
 		pg_atomic_init_u64(&shared->latest_page_number, 0);
 
 		shared->slru_stats_idx = pgstat_get_slru_index(name);
@@ -259,6 +282,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		/* Initialize LWLocks */
 		shared->buffer_locks = (LWLockPadded *) (ptr + offset);
 		offset += MAXALIGN(nslots * sizeof(LWLockPadded));
+		shared->bank_locks = (LWLockPadded *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(LWLockPadded));
+		shared->bank_cur_lru_count = (int *) (ptr + offset);
+		offset += MAXALIGN(nbanks * sizeof(int));
 
 		if (nlsns > 0)
 		{
@@ -270,7 +297,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		for (int slotno = 0; slotno < nslots; slotno++)
 		{
 			LWLockInitialize(&shared->buffer_locks[slotno].lock,
-							 tranche_id);
+							 buffer_tranche_id);
 
 			shared->page_buffer[slotno] = ptr;
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
@@ -279,11 +306,21 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			ptr += BLCKSZ;
 		}
 
+		/* Initialize the slot banks. */
+		for (int bankno = 0; bankno < nbanks; bankno++)
+		{
+			LWLockInitialize(&shared->bank_locks[bankno].lock, bank_tranche_id);
+			shared->bank_cur_lru_count[bankno] = 0;
+		}
+
 		/* Should fit to estimated shmem size */
 		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
+	{
 		Assert(found);
+		Assert(shared->num_slots == nslots);
+	}
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
@@ -292,16 +329,33 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
+	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
+/*
+ * Helper function for GUC check_hook to check whether slru buffers are in
+ * multiples of SLRU_BANK_SIZE.
+ */
+bool
+check_slru_buffers(const char *name, int *newval)
+{
+	/* Valid values are multiples of SLRU_BANK_SIZE */
+	if (*newval % SLRU_BANK_SIZE == 0)
+		return true;
+
+	GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
+						SLRU_BANK_SIZE);
+	return false;
+}
+
 /*
  * Initialize (or reinitialize) a page to zeroes.
  *
  * The page is not actually written, just set up in shared memory.
  * The slot number of the new page is returned.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
@@ -309,6 +363,8 @@ SimpleLruZeroPage(SlruCtl ctl, int64 pageno)
 	SlruShared	shared = ctl->shared;
 	int			slotno;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
 	Assert(shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
@@ -369,18 +425,21 @@ SimpleLruZeroLSNs(SlruCtl ctl, int slotno)
  * guarantee that new I/O hasn't been started before we return, though.
  * In fact the slot might not even contain the same page anymore.)
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SimpleLruWaitIO(SlruCtl ctl, int slotno)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = SlotGetBankNumber(slotno);
+
+	Assert(&shared->page_status[slotno] != SLRU_PAGE_EMPTY);
 
 	/* See notes at top of file */
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_SHARED);
 	LWLockRelease(&shared->buffer_locks[slotno].lock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	/*
 	 * If the slot is still in an io-in-progress state, then either someone
@@ -423,7 +482,7 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 int
 SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -431,18 +490,21 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 {
 	SlruShared	shared = ctl->shared;
 
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
+		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
 		slotno = SlruSelectLRUPage(ctl, pageno);
 
 		/* Did we find the page in memory? */
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+			shared->page_number[slotno] == pageno)
 		{
 			/*
 			 * If page is still being read in, we must wait for I/O.  Likewise
@@ -477,9 +539,10 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
+		bankno = SlotGetBankNumber(slotno);
 
-		/* Release control lock while doing I/O */
-		LWLockRelease(shared->ControlLock);
+		/* Release bank lock while doing I/O */
+		LWLockRelease(&shared->bank_locks[bankno].lock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -487,8 +550,8 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		/* Set the LSNs for this newly read-in page to zero */
 		SimpleLruZeroLSNs(ctl, slotno);
 
-		/* Re-acquire control lock and update page state */
-		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		/* Re-acquire bank control lock and update page state */
+		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -522,22 +585,25 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
  * Return value is the shared-buffer slot number now holding the page.
  * The buffer's LRU access info is updated.
  *
- * Control lock must NOT be held at entry, but will be held at exit.
+ * Bank control lock must NOT be held at entry, but will be held at exit.
  * It is unspecified whether the lock will be shared or exclusive.
  */
 int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
+	int			bankno = pageno & ctl->bank_mask;
+	int			bankstart = bankno * SLRU_BANK_SIZE;
+	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(shared->ControlLock, LW_SHARED);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (int slotno = 0; slotno < shared->num_slots; slotno++)
+	for (int slotno = bankstart; slotno < bankend; slotno++)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+			shared->page_number[slotno] == pageno &&
 			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
 			/* See comments for SlruRecentlyUsed macro */
@@ -551,8 +617,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(shared->ControlLock);
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockRelease(&shared->bank_locks[bankno].lock);
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -566,15 +632,19 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
  * the write).  However, we *do* attempt a fresh write even if the page
  * is already being written; this is for checkpoints.
  *
- * Control lock must be held at entry, and will be held at exit.
+ * Bank lock must be held at entry, and will be held at exit.
  */
 static void
 SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 {
 	SlruShared	shared = ctl->shared;
 	int64		pageno = shared->page_number[slotno];
+	int			bankno = SlotGetBankNumber(slotno);
 	bool		ok;
 
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+
 	/* If a write is in progress, wait for it to finish */
 	while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
 		   shared->page_number[slotno] == pageno)
@@ -601,8 +671,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 	/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 	LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
 
-	/* Release control lock while doing I/O */
-	LWLockRelease(shared->ControlLock);
+	/* Release bank lock while doing I/O */
+	LWLockRelease(&shared->bank_locks[bankno].lock);
 
 	/* Do the write */
 	ok = SlruPhysicalWritePage(ctl, pageno, slotno, fdata);
@@ -614,8 +684,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 			CloseTransientFile(fdata->fd[i]);
 	}
 
-	/* Re-acquire control lock and update page state */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	/* Re-acquire bank lock and update page state */
+	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
 
 	Assert(shared->page_number[slotno] == pageno &&
 		   shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS);
@@ -644,6 +714,8 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata)
 void
 SimpleLruWritePage(SlruCtl ctl, int slotno)
 {
+	Assert(&ctl->shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
 	SlruInternalWritePage(ctl, slotno, NULL);
 }
 
@@ -1028,17 +1100,53 @@ SlruReportIOError(SlruCtl ctl, int64 pageno, TransactionId xid)
 }
 
 /*
- * Select the slot to re-use when we need a free slot.
+ * Mark a buffer slot "most recently used".
+ */
+static inline void
+SlruRecentlyUsed(SlruShared shared, int slotno)
+{
+	int			bankno = SlotGetBankNumber(slotno);
+	int			new_lru_count = shared->bank_cur_lru_count[bankno];
+
+	Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+
+	/*
+	 * The reason for the if-test is that there are often many consecutive
+	 * accesses to the same page (particularly the latest page).  By
+	 * suppressing useless increments of bank_cur_lru_count, we reduce the
+	 * probability that old pages' counts will "wrap around" and make them
+	 * appear recently used.
+	 *
+	 * We allow this code to be executed concurrently by multiple processes
+	 * within SimpleLruReadPage_ReadOnly().  As long as int reads and writes
+	 * are atomic, this should not cause any completely-bogus values to enter
+	 * the computation.  However, it is possible for either bank_cur_lru_count
+	 * or individual page_lru_count entries to be "reset" to lower values than
+	 * they should have, in case a process is delayed while it executes this
+	 * function.  With care in SlruSelectLRUPage(), this does little harm, and
+	 * in any case the absolute worst possible consequence is a nonoptimal
+	 * choice of page to evict.  The gain from allowing concurrent reads of
+	 * SLRU pages seems worth it.
+	 */
+	if (new_lru_count != shared->page_lru_count[slotno])
+	{
+		shared->bank_cur_lru_count[bankno] = ++new_lru_count;
+		shared->page_lru_count[slotno] = new_lru_count;
+	}
+}
+
+/*
+ * Select the slot to re-use when we need a free slot for the given page.
  *
- * The target page number is passed because we need to consider the
- * possibility that some other process reads in the target page while
- * we are doing I/O to free a slot.  Hence, check or recheck to see if
- * any slot already holds the target page, and return that slot if so.
- * Thus, the returned slot is *either* a slot already holding the pageno
- * (could be any state except EMPTY), *or* a freeable slot (state EMPTY
- * or CLEAN).
+ * The target page number is passed not only because we need to know the
+ * correct bank to use, but also because we need to consider the possibility
+ * that some other process reads in the target page while we are doing I/O to
+ * free a slot.  Hence, check or recheck to see if any slot already holds the
+ * target page, and return that slot if so.  Thus, the returned slot is
+ * *either* a slot already holding the pageno (could be any state except
+ * EMPTY), *or* a freeable slot (state EMPTY or CLEAN).
  *
- * Control lock must be held at entry, and will be held at exit.
+ * The correct bank lock must be held at entry, and will be held at exit.
  */
 static int
 SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
@@ -1055,12 +1163,17 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			bankno = pageno & ctl->bank_mask;
+		int			bankstart = bankno * SLRU_BANK_SIZE;
+		int			bankend = bankstart + SLRU_BANK_SIZE;
+
+		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
 
 		/* See if page already has a buffer assigned */
 		for (int slotno = 0; slotno < shared->num_slots; slotno++)
 		{
-			if (shared->page_number[slotno] == pageno &&
-				shared->page_status[slotno] != SLRU_PAGE_EMPTY)
+			if (shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
+				shared->page_number[slotno] == pageno)
 				return slotno;
 		}
 
@@ -1091,14 +1204,15 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		 * That gets us back on the path to having good data when there are
 		 * multiple pages with the same lru_count.
 		 */
-		cur_count = (shared->cur_lru_count)++;
-		for (int slotno = 0; slotno < shared->num_slots; slotno++)
+		cur_count = (shared->bank_cur_lru_count[bankno])++;
+		for (int slotno = bankstart; slotno < bankend; slotno++)
 		{
 			int			this_delta;
 			int64		this_page_number;
 
 			if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 				return slotno;
+
 			this_delta = cur_count - shared->page_lru_count[slotno];
 			if (this_delta < 0)
 			{
@@ -1193,6 +1307,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	SlruShared	shared = ctl->shared;
 	SlruWriteAllData fdata;
 	int64		pageno = 0;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		ok;
 
 	/* update the stats counter of flushes */
@@ -1203,10 +1318,27 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 	 */
 	fdata.num_files = 0;
 
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
+		/* Do nothing if slot is unused */
+		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
+			continue;
+
 		SlruInternalWritePage(ctl, slotno, &fdata);
 
 		/*
@@ -1220,7 +1352,7 @@ SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied)
 				!shared->page_dirty[slotno]));
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/*
 	 * Now close any files that were open
@@ -1259,6 +1391,7 @@ void
 SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank;
 
 	/* update the stats counter of truncates */
 	pgstat_count_slru_truncate(shared->slru_stats_idx);
@@ -1269,8 +1402,6 @@ SimpleLruTruncate(SlruCtl ctl, int64 cutoffPage)
 	 * or just after a checkpoint, any dirty pages should have been flushed
 	 * already ... we're just being extra careful here.)
 	 */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
-
 restart:
 
 	/*
@@ -1282,15 +1413,29 @@ restart:
 	if (ctl->PagePrecedes(pg_atomic_read_u64(&shared->latest_page_number),
 						  cutoffPage))
 	{
-		LWLockRelease(shared->ControlLock);
 		ereport(LOG,
 				(errmsg("could not truncate directory \"%s\": apparent wraparound",
 						ctl->Dir)));
 		return;
 	}
 
+	prevbank = SlotGetBankNumber(0);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
+
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
@@ -1320,10 +1465,12 @@ restart:
 			SlruInternalWritePage(ctl, slotno, NULL);
 		else
 			SimpleLruWaitIO(ctl, slotno);
+
+		LWLockRelease(&shared->bank_locks[prevbank].lock);
 		goto restart;
 	}
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 
 	/* Now we can remove the old segment(s) */
 	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
@@ -1362,19 +1509,33 @@ void
 SlruDeleteSegment(SlruCtl ctl, int64 segno)
 {
 	SlruShared	shared = ctl->shared;
+	int			prevbank = SlotGetBankNumber(0);
 	bool		did_write;
 
 	/* Clean out any possibly existing references to the segment. */
-	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+	LWLockAcquire(&shared->bank_locks[prevbank].lock, LW_EXCLUSIVE);
 restart:
 	did_write = false;
 	for (int slotno = 0; slotno < shared->num_slots; slotno++)
 	{
-		int			pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
+		int			pagesegno;
+		int			curbank = SlotGetBankNumber(slotno);
+
+		/*
+		 * If the current bank lock is not same as the previous bank lock then
+		 * release the previous lock and acquire the new lock.
+		 */
+		if (curbank != prevbank)
+		{
+			LWLockRelease(&shared->bank_locks[prevbank].lock);
+			LWLockAcquire(&shared->bank_locks[curbank].lock, LW_EXCLUSIVE);
+			prevbank = curbank;
+		}
 
 		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
 			continue;
 
+		pagesegno = shared->page_number[slotno] / SLRU_PAGES_PER_SEGMENT;
 		/* not the segment we're looking for */
 		if (pagesegno != segno)
 			continue;
@@ -1405,7 +1566,7 @@ restart:
 
 	SlruInternalDeleteSegment(ctl, segno);
 
-	LWLockRelease(shared->ControlLock);
+	LWLockRelease(&shared->bank_locks[prevbank].lock);
 }
 
 /*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6aa47af43e..c7ae7dbc4f 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -31,7 +31,9 @@
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pg_trace.h"
+#include "utils/guc_hooks.h"
 #include "utils/snapmgr.h"
 
 
@@ -85,12 +87,14 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 	int64		pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
 	int			slotno;
+	LWLock	   *lock;
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
 	Assert(TransactionIdFollows(xid, parent));
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
@@ -108,7 +112,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
 		SubTransCtl->shared->page_dirty[slotno] = true;
 	}
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -138,7 +142,7 @@ SubTransGetParent(TransactionId xid)
 
 	parent = *ptr;
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SubTransCtl, pageno));
 
 	return parent;
 }
@@ -186,6 +190,22 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	return previousXid;
 }
 
+/*
+ * Number of shared SUBTRANS buffers.
+ *
+ * If asked to autotune, use 2MB for every 1GB of shared buffers, up to 8MB.
+ * Otherwise just cap the configured amount to be between 16 and the maximum
+ * allowed.
+ */
+static int
+SUBTRANSShmemBuffers(void)
+{
+	/* auto-tune based on shared buffers */
+	if (subtransaction_buffers == 0)
+		return SimpleLruAutotuneBuffers(512, 1024);
+
+	return Min(Max(16, subtransaction_buffers), SLRU_MAX_ALLOWED_BUFFERS);
+}
 
 /*
  * Initialization of shared memory for SUBTRANS
@@ -193,20 +213,42 @@ SubTransGetTopmostTransaction(TransactionId xid)
 Size
 SUBTRANSShmemSize(void)
 {
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
+	return SimpleLruShmemSize(SUBTRANSShmemBuffers(), 0);
 }
 
 void
 SUBTRANSShmemInit(void)
 {
+	/* If auto-tuning is requested, now is the time to do it */
+	if (subtransaction_buffers == 0)
+	{
+		char		buf[32];
+
+		snprintf(buf, sizeof(buf), "%d", SUBTRANSShmemBuffers());
+		SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+						PGC_S_DYNAMIC_DEFAULT);
+		if (subtransaction_buffers == 0)	/* failed to apply it? */
+			SetConfigOption("subtransaction_buffers", buf, PGC_POSTMASTER,
+							PGC_S_OVERRIDE);
+	}
+	Assert(subtransaction_buffers != 0);
+
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "subtransaction", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransSLRULock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+	SimpleLruInit(SubTransCtl, "subtransaction", SUBTRANSShmemBuffers(), 0,
+				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
+/*
+ * GUC check_hook for subtransaction_buffers
+ */
+bool
+check_subtrans_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("subtransaction_buffers", newval);
+}
+
 /*
  * This func must be called ONCE on system install.  It creates
  * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
@@ -221,8 +263,9 @@ void
 BootStrapSUBTRANS(void)
 {
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(SubTransCtl, 0);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Create and zero the first page of the subtrans log */
 	slotno = ZeroSUBTRANSPage(0);
@@ -231,7 +274,7 @@ BootStrapSUBTRANS(void)
 	SimpleLruWritePage(SubTransCtl, slotno);
 	Assert(!SubTransCtl->shared->page_dirty[slotno]);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 /*
@@ -261,6 +304,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
+	LWLock	   *prevlock;
+	LWLock	   *lock;
 
 	/*
 	 * Since we don't expect pg_subtrans to be valid across crashes, we
@@ -268,23 +313,47 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
 	 * the new page without regard to whatever was previously on disk.
 	 */
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
-
 	startPage = TransactionIdToPage(oldestActiveXID);
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
+	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
 	while (startPage != endPage)
 	{
+		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+		/*
+		 * Check if we need to acquire the lock on the new bank then release
+		 * the lock on the old bank and acquire on the new bank.
+		 */
+		if (prevlock != lock)
+		{
+			LWLockRelease(prevlock);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			prevlock = lock;
+		}
+
 		(void) ZeroSUBTRANSPage(startPage);
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
-	(void) ZeroSUBTRANSPage(startPage);
 
-	LWLockRelease(SubtransSLRULock);
+	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
+
+	/*
+	 * Check if we need to acquire the lock on the new bank then release the
+	 * lock on the old bank and acquire on the new bank.
+	 */
+	if (prevlock != lock)
+	{
+		LWLockRelease(prevlock);
+		LWLockAcquire(lock, LW_EXCLUSIVE);
+	}
+	(void) ZeroSUBTRANSPage(startPage);
+	LWLockRelease(lock);
 }
 
 /*
@@ -318,6 +387,7 @@ void
 ExtendSUBTRANS(TransactionId newestXact)
 {
 	int64		pageno;
+	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -329,12 +399,13 @@ ExtendSUBTRANS(TransactionId newestXact)
 
 	pageno = TransactionIdToPage(newestXact);
 
-	LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
+	lock = SimpleLruGetBankLock(SubTransCtl, pageno);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	/* Zero the page */
 	ZeroSUBTRANSPage(pageno);
 
-	LWLockRelease(SubtransSLRULock);
+	LWLockRelease(lock);
 }
 
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 490c84dc19..23444f2a80 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 997857679e..d405c61b21 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
 	[LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
 	[LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
 	[LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
+	[LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
+	[LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
+	[LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
+	[LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
+	[LWTRANCHE_SERIAL_SLRU] = "SerialSLRU",
+	[LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
+	[LWTRANCHE_XACT_SLRU] = "XactSLRU",
 };
 
 StaticAssertDecl(lengthof(BuiltinTrancheNames) ==
@@ -776,7 +783,7 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
  * in mode.
  *
  * This function will not block waiting for a lock to become free - that's the
- * callers job.
+ * caller's job.
  *
  * Returns true if the lock isn't free and we need to wait.
  */
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 3d59d3646e..284d168f77 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,11 +16,11 @@ WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
 # 10 was CheckpointLock
-XactSLRULock						11
-SubtransSLRULock					12
+# 11 was XactSLRULock
+# 12 was SubtransSLRULock
 MultiXactGenLock					13
-MultiXactOffsetSLRULock				14
-MultiXactMemberSLRULock				15
+# 14 was MultiXactOffsetSLRULock
+# 15 was MultiXactMemberSLRULock
 RelCacheInitLock					16
 CheckpointerCommLock				17
 TwoPhaseStateLock					18
@@ -31,19 +31,19 @@ AutovacuumLock						22
 AutovacuumScheduleLock				23
 SyncScanLock						24
 RelationMappingLock					25
-NotifySLRULock						26
+#26 was NotifySLRULock
 NotifyQueueLock						27
 SerializableXactHashLock			28
 SerializableFinishedListLock		29
 SerializablePredicateListLock		30
-SerialSLRULock						31
+# 31 was SerialSLRULock
 SyncRepLock							32
 BackgroundWorkerLock				33
 DynamicSharedMemoryControlLock		34
 AutoFileLock						35
 ReplicationSlotAllocationLock		36
 ReplicationSlotControlLock			37
-CommitTsSLRULock					38
+#38 was CommitTsSLRULock
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 09e11680fc..61786b2e3d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -213,6 +213,7 @@
 #include "storage/predicate_internals.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "utils/guc_hooks.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
@@ -813,9 +814,9 @@ SerialInit(void)
 	 */
 	SerialSlruCtl->PagePrecedes = SerialPagePrecedesLogically;
 	SimpleLruInit(SerialSlruCtl, "serializable",
-				  NUM_SERIAL_BUFFERS, 0, SerialSLRULock, "pg_serial",
-				  LWTRANCHE_SERIAL_BUFFER, SYNC_HANDLER_NONE,
-				  false);
+				  serializable_buffers, 0, "pg_serial",
+				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
+				  SYNC_HANDLER_NONE, false);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
@@ -841,6 +842,15 @@ SerialInit(void)
 	}
 }
 
+/*
+ * GUC check_hook for serializable_buffers
+ */
+bool
+check_serial_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("serializable_buffers", newval);
+}
+
 /*
  * Record a committed read write serializable xid and the minimum
  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
@@ -854,15 +864,17 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	int			slotno;
 	int64		firstZeroPage;
 	bool		isNewPage;
+	LWLock	   *lock;
 
 	Assert(TransactionIdIsValid(xid));
 
 	targetPage = SerialPage(xid);
+	lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
 
 	/*
-	 * In this routine, we must hold both SerialControlLock and SerialSLRULock
-	 * simultaneously while making the SLRU data catch up with the new state
-	 * that we determine.
+	 * In this routine, we must hold both SerialControlLock and the SLRU bank
+	 * lock simultaneously while making the SLRU data catch up with the new
+	 * state that we determine.
 	 */
 	LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
 
@@ -898,7 +910,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(SerialSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	if (isNewPage)
 	{
@@ -916,7 +928,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
 
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(lock);
 	LWLockRelease(SerialControlLock);
 }
 
@@ -950,13 +962,13 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
 		return 0;
 
 	/*
-	 * The following function must be called without holding SerialSLRULock,
+	 * The following function must be called without holding SLRU bank lock,
 	 * but will return with that lock held, which must then be released.
 	 */
 	slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
 										SerialPage(xid), xid);
 	val = SerialValue(slotno, xid);
-	LWLockRelease(SerialSLRULock);
+	LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
 	return val;
 }
 
@@ -1367,7 +1379,7 @@ PredicateLockShmemSize(void)
 
 	/* Shared memory structures for SLRU tracking of old committed xids. */
 	size = add_size(size, sizeof(SerialControlData));
-	size = add_size(size, SimpleLruShmemSize(NUM_SERIAL_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(serializable_buffers, 0));
 
 	return size;
 }
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4fffb46625..ec2f31f82a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -295,11 +295,7 @@ SInvalWrite	"Waiting to add a message to the shared catalog invalidation queue."
 WALBufMapping	"Waiting to replace a page in WAL buffers."
 WALWrite	"Waiting for WAL buffers to be written to disk."
 ControlFile	"Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file."
-XactSLRU	"Waiting to access the transaction status SLRU cache."
-SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
 MultiXactGen	"Waiting to read or update shared multixact state."
-MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
-MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
 RelCacheInit	"Waiting to read or update a <filename>pg_internal.init</filename> relation cache initialization file."
 CheckpointerComm	"Waiting to manage fsync requests."
 TwoPhaseState	"Waiting to read or update the state of prepared transactions."
@@ -310,19 +306,16 @@ Autovacuum	"Waiting to read or update the current state of autovacuum workers."
 AutovacuumSchedule	"Waiting to ensure that a table selected for autovacuum still needs vacuuming."
 SyncScan	"Waiting to select the starting location of a synchronized table scan."
 RelationMapping	"Waiting to read or update a <filename>pg_filenode.map</filename> file (used to track the filenode assignments of certain system catalogs)."
-NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
 NotifyQueue	"Waiting to read or update <command>NOTIFY</command> messages."
 SerializableXactHash	"Waiting to read or update information about serializable transactions."
 SerializableFinishedList	"Waiting to access the list of finished serializable transactions."
 SerializablePredicateList	"Waiting to access the list of predicate locks held by serializable transactions."
-SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
 SyncRep	"Waiting to read or update information about the state of synchronous replication."
 BackgroundWorker	"Waiting to read or update background worker state."
 DynamicSharedMemoryControl	"Waiting to read or update dynamic shared memory allocation information."
 AutoFile	"Waiting to update the <filename>postgresql.auto.conf</filename> file."
 ReplicationSlotAllocation	"Waiting to allocate or free a replication slot."
 ReplicationSlotControl	"Waiting to read or update replication slot state."
-CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
 CommitTs	"Waiting to read or update the last value set for a transaction commit timestamp."
 ReplicationOrigin	"Waiting to create, drop or use a replication origin."
 MultiXactTruncation	"Waiting to read or truncate multixact information."
@@ -375,6 +368,14 @@ LogicalRepLauncherDSA	"Waiting to access logical replication launcher's dynamic
 LogicalRepLauncherHash	"Waiting to access logical replication launcher's shared hash table."
 DSMRegistryDSA	"Waiting to access dynamic shared memory registry's dynamic shared memory allocator."
 DSMRegistryHash	"Waiting to access dynamic shared memory registry's shared hash table."
+CommitTsSLRU	"Waiting to access the commit timestamp SLRU cache."
+MultiXactOffsetSLRU	"Waiting to access the multixact offset SLRU cache."
+MultiXactMemberSLRU	"Waiting to access the multixact member SLRU cache."
+NotifySLRU	"Waiting to access the <command>NOTIFY</command> message SLRU cache."
+SerialSLRU	"Waiting to access the serializable transaction conflict SLRU cache."
+SubtransSLRU	"Waiting to access the sub-transaction SLRU cache."
+XactSLRU	"Waiting to access the transaction status SLRU cache."
+
 
 #
 # Wait Events - Lock
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f024b1a849..5eaee88d96 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -157,3 +157,12 @@ int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+/* configurable SLRU buffer sizes */
+int			commit_timestamp_buffers = 0;
+int			multixact_member_buffers = 32;
+int			multixact_offset_buffers = 16;
+int			notify_buffers = 16;
+int			serializable_buffers = 32;
+int			subtransaction_buffers = 0;
+int			transaction_buffers = 0;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 527a2b2734..7e60695296 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -28,6 +28,7 @@
 
 #include "access/commit_ts.h"
 #include "access/gin.h"
+#include "access/slru.h"
 #include "access/toast_compression.h"
 #include "access/twophase.h"
 #include "access/xlog_internal.h"
@@ -2330,6 +2331,83 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&commit_timestamp_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_commit_ts_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_member_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact member cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_member_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_member_buffers, NULL, NULL
+	},
+
+	{
+		{"multixact_offset_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the MultiXact offset cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&multixact_offset_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_multixact_offset_buffers, NULL, NULL
+	},
+
+	{
+		{"notify_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the LISTEN/NOTIFY message cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&notify_buffers,
+		16, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_notify_buffers, NULL, NULL
+	},
+
+	{
+		{"serializable_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the serializable transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&serializable_buffers,
+		32, 16, SLRU_MAX_ALLOWED_BUFFERS,
+		check_serial_buffers, NULL, NULL
+	},
+
+	{
+		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&subtransaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_subtrans_buffers, NULL, NULL
+	},
+
+	{
+		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
+			NULL,
+			GUC_UNIT_BLOCKS
+		},
+		&transaction_buffers,
+		0, 0, SLRU_MAX_ALLOWED_BUFFERS,
+		check_transaction_buffers, NULL, NULL
+	},
+
 	{
 		{"temp_buffers", PGC_USERSET, RESOURCES_MEM,
 			gettext_noop("Sets the maximum number of temporary buffers used by each session."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..edcc0282b2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -50,6 +50,15 @@
 #external_pid_file = ''			# write an extra PID file
 					# (change requires restart)
 
+# - SLRU Buffers (change requires restart) -
+
+#commit_timestamp_buffers = 0			# memory for pg_commit_ts (0 = auto)
+#multixact_offset_buffers = 16			# memory for pg_multixact/offsets
+#multixact_member_buffers = 32			# memory for pg_multixact/members
+#notify_buffers = 16					# memory for pg_notify
+#serializable_buffers = 32				# memory for pg_serial
+#subtransaction_buffers = 0 			# memory for pg_subtrans (0 = auto)
+#transaction_buffers = 0				# memory for pg_xact (0 = auto)
 
 #------------------------------------------------------------------------------
 # CONNECTIONS AND AUTHENTICATION
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index becc365cb0..8e62917e49 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -40,7 +40,6 @@ extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
 									   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
 extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
-extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
 extern void CLOGShmemInit(void);
 extern void BootStrapCLOG(void);
diff --git a/src/include/access/commit_ts.h b/src/include/access/commit_ts.h
index 9c6f3a35ca..82d3aa8627 100644
--- a/src/include/access/commit_ts.h
+++ b/src/include/access/commit_ts.h
@@ -27,7 +27,6 @@ extern bool TransactionIdGetCommitTsData(TransactionId xid,
 extern TransactionId GetLatestCommitTsData(TimestampTz *ts,
 										   RepOriginId *nodeid);
 
-extern Size CommitTsShmemBuffers(void);
 extern Size CommitTsShmemSize(void);
 extern void CommitTsShmemInit(void);
 extern void BootStrapCommitTs(void);
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 233f67dbcc..7ffd256c74 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -29,10 +29,6 @@
 
 #define MaxMultiXactOffset	((MultiXactOffset) 0xFFFFFFFF)
 
-/* Number of SLRU buffers to use for multixact */
-#define NUM_MULTIXACTOFFSET_BUFFERS		8
-#define NUM_MULTIXACTMEMBER_BUFFERS		16
-
 /*
  * Possible multixact lock modes ("status").  The first four modes are for
  * tuple locks (FOR KEY SHARE, FOR SHARE, FOR NO KEY UPDATE, FOR UPDATE); the
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 2109488654..8a8d191873 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -17,6 +17,11 @@
 #include "storage/lwlock.h"
 #include "storage/sync.h"
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers must not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
 
 /*
  * Define SLRU segment size.  A page is the same BLCKSZ as is used everywhere
@@ -55,8 +60,6 @@ typedef enum
  */
 typedef struct SlruSharedData
 {
-	LWLock	   *ControlLock;
-
 	/* Number of buffers managed by this SLRU structure */
 	int			num_slots;
 
@@ -69,30 +72,41 @@ typedef struct SlruSharedData
 	bool	   *page_dirty;
 	int64	   *page_number;
 	int		   *page_lru_count;
+
+	/* The buffer_locks protects the I/O on each buffer slots */
 	LWLockPadded *buffer_locks;
 
+	/* Locks to protect the in memory buffer slot access in SLRU bank. */
+	LWLockPadded *bank_locks;
+
+	/*----------
+	 * A bank-wise LRU counter is maintained because we do a victim buffer
+	 * search within a bank. Furthermore, manipulating an individual bank
+	 * counter avoids frequent cache invalidation since we update it every time
+	 * we access the page.
+	 *
+	 * We mark a page "most recently used" by setting
+	 *		page_lru_count[slotno] = ++bank_cur_lru_count[bankno];
+	 * The oldest page in the bank is therefore the one with the highest value
+	 * of
+	 * 		bank_cur_lru_count[bankno] - page_lru_count[slotno]
+	 * The counts will eventually wrap around, but this calculation still
+	 * works as long as no page's age exceeds INT_MAX counts.
+	 *----------
+	 */
+	int		   *bank_cur_lru_count;
+
 	/*
 	 * Optional array of WAL flush LSNs associated with entries in the SLRU
 	 * pages.  If not zero/NULL, we must flush WAL before writing pages (true
-	 * for pg_xact, false for multixact, pg_subtrans, pg_notify).  group_lsn[]
-	 * has lsn_groups_per_page entries per buffer slot, each containing the
+	 * for pg_xact, false for everything else).  group_lsn[] has
+	 * lsn_groups_per_page entries per buffer slot, each containing the
 	 * highest LSN known for a contiguous group of SLRU entries on that slot's
 	 * page.
 	 */
 	XLogRecPtr *group_lsn;
 	int			lsn_groups_per_page;
 
-	/*----------
-	 * We mark a page "most recently used" by setting
-	 *		page_lru_count[slotno] = ++cur_lru_count;
-	 * The oldest page is therefore the one with the highest value of
-	 *		cur_lru_count - page_lru_count[slotno]
-	 * The counts will eventually wrap around, but this calculation still
-	 * works as long as no page's age exceeds INT_MAX counts.
-	 *----------
-	 */
-	int			cur_lru_count;
-
 	/*
 	 * latest_page_number is the page number of the current end of the log;
 	 * this is not critical data, since we use it only to avoid swapping out
@@ -114,6 +128,19 @@ typedef struct SlruCtlData
 {
 	SlruShared	shared;
 
+	/*
+	 * Bitmask to determine bank number from page number.
+	 */
+	bits16		bank_mask;
+
+	/*
+	 * If true, use long segment filenames formed from lower 48 bits of the
+	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
+	 * filenames formed from lower 16 bits of the segment number e.g.
+	 * pg_xact/1234.
+	 */
+	bool		long_segment_names;
+
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
@@ -132,28 +159,36 @@ typedef struct SlruCtlData
 	 */
 	bool		(*PagePrecedes) (int64, int64);
 
-	/*
-	 * If true, use long segment filenames formed from lower 48 bits of the
-	 * segment number, e.g. pg_xact/000000001234. Otherwise, use short
-	 * filenames formed from lower 16 bits of the segment number e.g.
-	 * pg_xact/1234.
-	 */
-	bool		long_segment_names;
-
 	/*
 	 * Dir is set during SimpleLruInit and does not change thereafter. Since
 	 * it's always the same, it doesn't need to be in shared memory.
 	 */
 	char		Dir[64];
+
 } SlruCtlData;
 
 typedef SlruCtlData *SlruCtl;
 
+/*
+ * Get the SLRU bank lock for given SlruCtl and the pageno.
+ *
+ * This lock needs to be acquired to access the slru buffer slots in the
+ * respective bank.
+ */
+static inline LWLock *
+SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
+{
+	int			bankno;
+
+	bankno = pageno & ctl->bank_mask;
+	return &(ctl->shared->bank_locks[bankno].lock);
+}
 
 extern Size SimpleLruShmemSize(int nslots, int nlsns);
+extern int	SimpleLruAutotuneBuffers(int divisor, int max);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-						  LWLock *ctllock, const char *subdir, int tranche_id,
-						  SyncRequestHandler sync_handler,
+						  const char *subdir, int buffer_tranche_id,
+						  int bank_tranche_id, SyncRequestHandler sync_handler,
 						  bool long_segment_names);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
@@ -182,5 +217,6 @@ extern bool SlruScanDirCbReportPresence(SlruCtl ctl, char *filename,
 										int64 segpage, void *data);
 extern bool SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage,
 								   void *data);
+extern bool check_slru_buffers(const char *name, int *newval);
 
 #endif							/* SLRU_H */
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index b0d2ad57e5..e2213cf3fd 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,9 +11,6 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 80b8583421..78daa25fa0 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -15,11 +15,6 @@
 
 #include <signal.h>
 
-/*
- * The number of SLRU page buffers we use for the notification queue.
- */
-#define NUM_NOTIFY_BUFFERS	8
-
 extern PGDLLIMPORT bool Trace_notify;
 extern PGDLLIMPORT int max_notify_queue_pages;
 extern PGDLLIMPORT volatile sig_atomic_t notifyInterruptPending;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 612fb5f42e..756d144c32 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -179,6 +179,14 @@ extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
 
+extern PGDLLIMPORT int commit_timestamp_buffers;
+extern PGDLLIMPORT int multixact_member_buffers;
+extern PGDLLIMPORT int multixact_offset_buffers;
+extern PGDLLIMPORT int notify_buffers;
+extern PGDLLIMPORT int serializable_buffers;
+extern PGDLLIMPORT int subtransaction_buffers;
+extern PGDLLIMPORT int transaction_buffers;
+
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT TimestampTz MyStartTimestamp;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 50a65e046d..10bea8c595 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,13 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_LAUNCHER_HASH,
 	LWTRANCHE_DSM_REGISTRY_DSA,
 	LWTRANCHE_DSM_REGISTRY_HASH,
+	LWTRANCHE_COMMITTS_SLRU,
+	LWTRANCHE_MULTIXACTMEMBER_SLRU,
+	LWTRANCHE_MULTIXACTOFFSET_SLRU,
+	LWTRANCHE_NOTIFY_SLRU,
+	LWTRANCHE_SERIAL_SLRU,
+	LWTRANCHE_SUBTRANS_SLRU,
+	LWTRANCHE_XACT_SLRU,
 	LWTRANCHE_FIRST_USER_DEFINED,
 }			BuiltinTrancheIds;
 
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index a7edd38fa9..14ee9b94a2 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -26,10 +26,6 @@ extern PGDLLIMPORT int max_predicate_locks_per_xact;
 extern PGDLLIMPORT int max_predicate_locks_per_relation;
 extern PGDLLIMPORT int max_predicate_locks_per_page;
 
-
-/* Number of SLRU buffers to use for Serial SLRU */
-#define NUM_SERIAL_BUFFERS		16
-
 /*
  * A handle used for sharing SERIALIZABLEXACT objects between the participants
  * in a parallel query.
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 339c490300..c8a7aa9a11 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -46,6 +46,8 @@ extern bool check_client_connection_check_interval(int *newval, void **extra,
 extern bool check_client_encoding(char **newval, void **extra, GucSource source);
 extern void assign_client_encoding(const char *newval, void *extra);
 extern bool check_cluster_name(char **newval, void **extra, GucSource source);
+extern bool check_commit_ts_buffers(int *newval, void **extra,
+									GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
@@ -91,6 +93,11 @@ extern bool check_max_worker_processes(int *newval, void **extra,
 									   GucSource source);
 extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
 extern void assign_max_stack_depth(int newval, void *extra);
+extern bool check_multixact_member_buffers(int *newval, void **extra,
+										   GucSource source);
+extern bool check_multixact_offset_buffers(int *newval, void **extra,
+										   GucSource source);
+extern bool check_notify_buffers(int *newval, void **extra, GucSource source);
 extern bool check_primary_slot_name(char **newval, void **extra,
 									GucSource source);
 extern bool check_random_seed(double *newval, void **extra, GucSource source);
@@ -122,12 +129,15 @@ extern void assign_role(const char *newval, void *extra);
 extern const char *show_role(void);
 extern bool check_search_path(char **newval, void **extra, GucSource source);
 extern void assign_search_path(const char *newval, void *extra);
+extern bool check_serial_buffers(int *newval, void **extra, GucSource source);
 extern bool check_session_authorization(char **newval, void **extra, GucSource source);
 extern void assign_session_authorization(const char *newval, void *extra);
 extern void assign_session_replication_role(int newval, void *extra);
 extern void assign_stats_fetch_consistency(int newval, void *extra);
 extern bool check_ssl(bool *newval, void **extra, GucSource source);
 extern bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
+extern bool check_subtrans_buffers(int *newval, void **extra,
+								   GucSource source);
 extern bool check_synchronous_standby_names(char **newval, void **extra,
 											GucSource source);
 extern void assign_synchronous_standby_names(const char *newval, void *extra);
@@ -152,6 +162,7 @@ extern const char *show_timezone(void);
 extern bool check_timezone_abbreviations(char **newval, void **extra,
 										 GucSource source);
 extern void assign_timezone_abbreviations(const char *newval, void *extra);
+extern bool check_transaction_buffers(int *newval, void **extra, GucSource source);
 extern bool check_transaction_deferrable(bool *newval, void **extra, GucSource source);
 extern bool check_transaction_isolation(int *newval, void **extra, GucSource source);
 extern bool check_transaction_read_only(bool *newval, void **extra, GucSource source);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 4b31f331ca..068a21f125 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -40,10 +40,6 @@ PG_FUNCTION_INFO_V1(test_slru_delete_all);
 /* Number of SLRU page slots */
 #define NUM_TEST_BUFFERS		16
 
-/* SLRU control lock */
-LWLock		TestSLRULock;
-#define TestSLRULock (&TestSLRULock)
-
 static SlruCtlData TestSlruCtlData;
 #define TestSlruCtl			(&TestSlruCtlData)
 
@@ -63,9 +59,9 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
-
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruZeroPage(TestSlruCtl, pageno);
 
 	/* these should match */
@@ -80,7 +76,7 @@ test_slru_page_write(PG_FUNCTION_ARGS)
 			BLCKSZ - 1);
 
 	SimpleLruWritePage(TestSlruCtl, slotno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_VOID();
 }
@@ -99,13 +95,14 @@ test_slru_page_read(PG_FUNCTION_ARGS)
 	bool		write_ok = PG_GETARG_BOOL(1);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	slotno = SimpleLruReadPage(TestSlruCtl, pageno,
 							   write_ok, InvalidTransactionId);
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -116,14 +113,15 @@ test_slru_page_readonly(PG_FUNCTION_ARGS)
 	int64		pageno = PG_GETARG_INT64(0);
 	char	   *data = NULL;
 	int			slotno;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
 	/* find page in buffers, reading it if necessary */
 	slotno = SimpleLruReadPage_ReadOnly(TestSlruCtl,
 										pageno,
 										InvalidTransactionId);
-	Assert(LWLockHeldByMe(TestSLRULock));
+	Assert(LWLockHeldByMe(lock));
 	data = (char *) TestSlruCtl->shared->page_buffer[slotno];
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_TEXT_P(cstring_to_text(data));
 }
@@ -133,10 +131,11 @@ test_slru_page_exists(PG_FUNCTION_ARGS)
 {
 	int64		pageno = PG_GETARG_INT64(0);
 	bool		found;
+	LWLock	   *lock = SimpleLruGetBankLock(TestSlruCtl, pageno);
 
-	LWLockAcquire(TestSLRULock, LW_EXCLUSIVE);
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 	found = SimpleLruDoesPhysicalPageExist(TestSlruCtl, pageno);
-	LWLockRelease(TestSLRULock);
+	LWLockRelease(lock);
 
 	PG_RETURN_BOOL(found);
 }
@@ -221,6 +220,7 @@ test_slru_shmem_startup(void)
 	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
+	int			test_buffer_tranche_id;
 
 	if (prev_shmem_startup_hook)
 		prev_shmem_startup_hook();
@@ -234,12 +234,15 @@ test_slru_shmem_startup(void)
 	/* initialize the SLRU facility */
 	test_tranche_id = LWLockNewTrancheId();
 	LWLockRegisterTranche(test_tranche_id, "test_slru_tranche");
-	LWLockInitialize(TestSLRULock, test_tranche_id);
+
+	test_buffer_tranche_id = LWLockNewTrancheId();
+	LWLockRegisterTranche(test_tranche_id, "test_buffer_tranche");
 
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
-				  NUM_TEST_BUFFERS, 0, TestSLRULock, slru_dir_name,
-				  test_tranche_id, SYNC_HANDLER_NONE, long_segment_names);
+				  NUM_TEST_BUFFERS, 0, slru_dir_name,
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
+				  long_segment_names);
 }
 
 void
-- 
2.39.2

#119Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Alvaro Herrera (#118)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 27 Feb 2024, at 22:33, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

<v21-0001-Rename-SLRU-elements-in-pg_stat_slru.patch><v21-0002-Make-SLRU-buffer-sizes-configurable.patch>

These patches look amazing!

Best regards, Andrey Borodin.

#120Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#118)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-27, Alvaro Herrera wrote:

Here's the complete set, with these two names using the singular.

BTW one thing I had not noticed is that before this patch we have
minimum shmem size that's lower than the lowest you can go with the new
code.

This means Postgres may no longer start when extremely tight memory
restrictions (and of course use more memory even when idle or with small
databases). I wonder to what extent should we make an effort to relax
that. For small, largely inactive servers, this is just memory we use
for no good reason. However, anything we do here will impact
performance on the high end, because as Andrey says this will add
calculations and jumps where there are none today.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)

#121Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#120)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Tue, Feb 27, 2024 at 11:41 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2024-Feb-27, Alvaro Herrera wrote:

Here's the complete set, with these two names using the singular.

BTW one thing I had not noticed is that before this patch we have
minimum shmem size that's lower than the lowest you can go with the new
code.

This means Postgres may no longer start when extremely tight memory
restrictions (and of course use more memory even when idle or with small
databases). I wonder to what extent should we make an effort to relax
that. For small, largely inactive servers, this is just memory we use
for no good reason. However, anything we do here will impact
performance on the high end, because as Andrey says this will add
calculations and jumps where there are none today.

I was just comparing the minimum memory required for SLRU when the system
is minimally configured, correct me if I am wrong.

SLRU unpatched
patched
commit_timestamp_buffers 4 16
subtransaction_buffers 32 16
transaction_buffers 4 16
multixact_offset_buffers 8 16
multixact_member_buffers 16 16
notify_buffers 8
16
serializable_buffers 16 16
-------------------------------------------------------------------------------------
total buffers 88
112

so that is < 200kB of extra memory on a minimally configured system, IMHO
this should not matter.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#122Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#118)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

At Tue, 27 Feb 2024 18:33:18 +0100, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

Here's the complete set, with these two names using the singular.

The commit by the second patch added several GUC descriptions:

Sets the size of the dedicated buffer pool used for the commit timestamp cache.

Some of them, commit_timestamp_buffers, transaction_buffers,
subtransaction_buffers use 0 to mean auto-tuning based on
shared-buffer size. I think it's worth adding an extra_desc such as "0
to automatically determine this value based on the shared buffer
size".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#123Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Kyotaro Horiguchi (#122)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-29, Kyotaro Horiguchi wrote:

At Tue, 27 Feb 2024 18:33:18 +0100, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

Here's the complete set, with these two names using the singular.

The commit by the second patch added several GUC descriptions:

Sets the size of the dedicated buffer pool used for the commit timestamp cache.

Some of them, commit_timestamp_buffers, transaction_buffers,
subtransaction_buffers use 0 to mean auto-tuning based on
shared-buffer size. I think it's worth adding an extra_desc such as "0
to automatically determine this value based on the shared buffer
size".

How about this?

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

Attachments:

0001-extra_desc.patchtext/x-diff; charset=utf-8Download
From d0d7216eb4e2e2e9e71aa849cf90c218bbe2b164 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 29 Feb 2024 11:45:31 +0100
Subject: [PATCH] extra_desc

---
 src/backend/utils/misc/guc_tables.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 93ded31ed9..543a87c659 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2287,7 +2287,7 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"commit_timestamp_buffers", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Sets the size of the dedicated buffer pool used for the commit timestamp cache."),
-			NULL,
+			gettext_noop("Specify 0 to have this value determined as a fraction of shared_buffers."),
 			GUC_UNIT_BLOCKS
 		},
 		&commit_timestamp_buffers,
@@ -2342,7 +2342,7 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"subtransaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Sets the size of the dedicated buffer pool used for the sub-transaction cache."),
-			NULL,
+			gettext_noop("Specify 0 to have this value determined as a fraction of shared_buffers."),
 			GUC_UNIT_BLOCKS
 		},
 		&subtransaction_buffers,
@@ -2353,7 +2353,7 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"transaction_buffers", PGC_POSTMASTER, RESOURCES_MEM,
 			gettext_noop("Sets the size of the dedicated buffer pool used for the transaction status cache."),
-			NULL,
+			gettext_noop("Specify 0 to have this value determined as a fraction of shared_buffers."),
 			GUC_UNIT_BLOCKS
 		},
 		&transaction_buffers,
@@ -2868,7 +2868,7 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"wal_buffers", PGC_POSTMASTER, WAL_SETTINGS,
 			gettext_noop("Sets the number of disk-page buffers in shared memory for WAL."),
-			NULL,
+			gettext_noop("Specify -1 to have this value determined as a fraction of shared_buffers."),
 			GUC_UNIT_XBLOCKS
 		},
 		&XLOGbuffers,
-- 
2.39.2

#124Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#123)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Feb-29, Alvaro Herrera wrote:

On 2024-Feb-29, Kyotaro Horiguchi wrote:

Some of them, commit_timestamp_buffers, transaction_buffers,
subtransaction_buffers use 0 to mean auto-tuning based on
shared-buffer size. I think it's worth adding an extra_desc such as "0
to automatically determine this value based on the shared buffer
size".

How about this?

Pushed that way, but we can discuss further wording improvements/changes
if someone wants to propose any.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

#125Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#124)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Pushed that way, but we can discuss further wording improvements/changes
if someone wants to propose any.

I just noticed that drongo is complaining about two lines added
by commit 53c2a97a9:

drongo | 2024-03-04 14:34:52 | ../pgsql/src/backend/access/transam/slru.c(436): warning C4047: '!=': 'SlruPageStatus *' differs in levels of indirection from 'int'
drongo | 2024-03-04 14:34:52 | ../pgsql/src/backend/access/transam/slru.c(717): warning C4047: '!=': 'SlruPageStatus *' differs in levels of indirection from 'int'

These lines are

Assert(&shared->page_status[slotno] != SLRU_PAGE_EMPTY);

Assert(&ctl->shared->page_status[slotno] != SLRU_PAGE_EMPTY);

These are comparing the address of something with an enum value,
which surely cannot be sane. Is the "&" operator incorrect?

It looks like SLRU_PAGE_EMPTY has (by chance, or deliberately)
the numeric value of zero, so I guess the majority of our BF
animals are understanding this as "address != NULL". But that
doesn't look like a useful test to be making.

regards, tom lane

#126Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#125)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

I wrote:

It looks like SLRU_PAGE_EMPTY has (by chance, or deliberately)
the numeric value of zero, so I guess the majority of our BF
animals are understanding this as "address != NULL". But that
doesn't look like a useful test to be making.

In hopes of noticing whether there are other similar thinkos,
I permuted the order of the SlruPageStatus enum values, and
now I get the expected warnings from gcc:

In file included from ../../../../src/include/postgres.h:45,
from slru.c:59:
slru.c: In function ‘SimpleLruWaitIO’:
slru.c:436:38: warning: comparison between pointer and integer
Assert(&shared->page_status[slotno] != SLRU_PAGE_EMPTY);
^~
../../../../src/include/c.h:862:9: note: in definition of macro ‘Assert’
if (!(condition)) \
^~~~~~~~~
slru.c: In function ‘SimpleLruWritePage’:
slru.c:717:43: warning: comparison between pointer and integer
Assert(&ctl->shared->page_status[slotno] != SLRU_PAGE_EMPTY);
^~
../../../../src/include/c.h:862:9: note: in definition of macro ‘Assert’
if (!(condition)) \
^~~~~~~~~

So it looks like it's just these two places.

regards, tom lane

#127Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#126)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Mar-04, Tom Lane wrote:

In hopes of noticing whether there are other similar thinkos,
I permuted the order of the SlruPageStatus enum values, and
now I get the expected warnings from gcc:

Thanks for checking! I pushed the fixes.

Maybe we should assign a nonzero value (= 1) to the first element of
enums, to avoid this kind of mistake.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#128Alexander Lakhin
exclusion@gmail.com
In reply to: Alvaro Herrera (#118)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Hello Alvaro,

27.02.2024 20:33, Alvaro Herrera wrote:

Here's the complete set, with these two names using the singular.

I've managed to trigger an assert added by 53c2a97a9.
Please try the following script against a server compiled with
-DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
define, it just simplifies reproducing...):
# initdb & start ...

createdb test
echo "
SELECT pg_current_xact_id() AS tx
\gset

SELECT format('CREATE TABLE t%s(i int)', g)
  FROM generate_series(1, 1022 - :tx) g
\gexec

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT pg_current_xact_id();
SELECT pg_sleep(5);
" | psql test &

echo "
SELECT pg_sleep(1);
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 1 INTO a;
COMMIT;

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT 2 INTO b;
" | psql test

It fails for me with the following stack trace:
TRAP: failed Assert("LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE)"), File: "slru.c", Line: 366,
PID: 21711
ExceptionalCondition at assert.c:52:13
SimpleLruZeroPage at slru.c:369:11
SerialAdd at predicate.c:921:20
SummarizeOldestCommittedSxact at predicate.c:1521:2
GetSerializableTransactionSnapshotInt at predicate.c:1787:16
GetSerializableTransactionSnapshot at predicate.c:1691:1
GetTransactionSnapshot at snapmgr.c:264:21
exec_simple_query at postgres.c:1162:4
...

Best regards,
Alexander

#129Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alexander Lakhin (#128)
1 attachment(s)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Hello,

On 2024-Apr-03, Alexander Lakhin wrote:

I've managed to trigger an assert added by 53c2a97a9.
Please try the following script against a server compiled with
-DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
define, it just simplifies reproducing...):

Ah yes, absolutely, we're missing to trade the correct SLRU bank lock
there. This rewrite of that small piece should fix it. Thanks for
reporting this.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

Attachments:

0001-Fix-zeroing-of-pg_serial-page-without-SLRU-bank-lock.patchtext/x-diff; charset=utf-8Download
From 44c39cf4bf258fb0b65aa1acc5f84e5d7f729eb1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 3 Apr 2024 16:00:24 +0200
Subject: [PATCH] Fix zeroing of pg_serial page without SLRU bank lock

---
 src/backend/storage/lmgr/predicate.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 3f378c0099..d5bbfbd4c6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -137,7 +137,7 @@
  *	SerialControlLock
  *		- Protects SerialControlData members
  *
- *	SerialSLRULock
+ *	SLRU per-bank locks
  *		- Protects SerialSlruCtl
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -908,20 +908,25 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
 	if (isNewPage)
 		serialControl->headPage = targetPage;
 
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
 	if (isNewPage)
 	{
-		/* Initialize intervening pages. */
-		while (firstZeroPage != targetPage)
+		/* Initialize intervening pages; might involve trading locks */
+		for (;;)
 		{
-			(void) SimpleLruZeroPage(SerialSlruCtl, firstZeroPage);
+			lock = SimpleLruGetBankLock(SerialSlruCtl, firstZeroPage);
+			LWLockAcquire(lock, LW_EXCLUSIVE);
+			slotno = SimpleLruZeroPage(SerialSlruCtl, firstZeroPage);
+			if (firstZeroPage == targetPage)
+				break;
 			firstZeroPage = SerialNextPage(firstZeroPage);
+			LWLockRelease(lock);
 		}
-		slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
 	}
 	else
+	{
+		LWLockAcquire(lock, LW_EXCLUSIVE);
 		slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+	}
 
 	SerialValue(slotno, xid) = minConflictCommitSeqNo;
 	SerialSlruCtl->shared->page_dirty[slotno] = true;
-- 
2.39.2

#130Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#129)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On Wed, Apr 3, 2024 at 7:40 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Hello,

On 2024-Apr-03, Alexander Lakhin wrote:

I've managed to trigger an assert added by 53c2a97a9.
Please try the following script against a server compiled with
-DTEST_SUMMARIZE_SERIAL (initially I observed this failure without the
define, it just simplifies reproducing...):

Ah yes, absolutely, we're missing to trade the correct SLRU bank lock
there. This rewrite of that small piece should fix it. Thanks for
reporting this.

Yeah, we missed acquiring the bank lock w.r.t. intervening pages,
thanks for reporting. Your fix looks correct to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#131Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#130)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

On 2024-Apr-03, Dilip Kumar wrote:

Yeah, we missed acquiring the bank lock w.r.t. intervening pages,
thanks for reporting. Your fix looks correct to me.

Thanks for the quick review! And thanks to Alexander for the report.
Pushed the fix.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"