pgsql: Improve performance of subsystems on top of SLRU

Started by Alvaro Herreraalmost 2 years ago6 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org

Improve performance of subsystems on top of SLRU

More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches. In order for this to work with good
performance, two additional changes are made:

1. the cache is divided in "banks" (to borrow terminology from CPU
caches), and algorithms such as eviction buffer search only affect
one specific bank. This forestalls the problem that linear searching
for a specific buffer across the whole cache takes too long: we only
have to search the specific bank, whose size is small. This work is
authored by Andrey Borodin.

2. Change the locking regime for the SLRU banks, so that each bank uses
a separate LWLock. This allows for increased scalability. This work
is authored by Dilip Kumar. (A part of this was previously committed as
d172b717c6f4.)

Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next. This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this. I (Álvaro) also added
lots of comments to make sure the design is sound.

The new GUCs match the names introduced by bcdfa5f2e2f2 in the
pg_stat_slru view.

The default values for these parameters are similar to the previous
sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB. (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.) The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0. These values based on
shared_buffers may need to be revisited, but that's an easy change.

There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally). However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: /messages/by-id/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: /messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/53c2a97a92665be6bd7d70bd62ae6158fe4db96e

Modified Files
--------------
doc/src/sgml/config.sgml | 139 +++++++++
doc/src/sgml/monitoring.sgml | 9 +-
src/backend/access/transam/clog.c | 243 +++++++++++-----
src/backend/access/transam/commit_ts.c | 88 ++++--
src/backend/access/transam/multixact.c | 190 +++++++++----
src/backend/access/transam/slru.c | 357 +++++++++++++++++-------
src/backend/access/transam/subtrans.c | 110 ++++++--
src/backend/commands/async.c | 61 ++--
src/backend/storage/lmgr/lwlock.c | 9 +-
src/backend/storage/lmgr/lwlocknames.txt | 14 +-
src/backend/storage/lmgr/predicate.c | 34 ++-
src/backend/utils/activity/wait_event_names.txt | 15 +-
src/backend/utils/init/globals.c | 9 +
src/backend/utils/misc/guc_tables.c | 78 ++++++
src/backend/utils/misc/postgresql.conf.sample | 9 +
src/include/access/clog.h | 1 -
src/include/access/commit_ts.h | 1 -
src/include/access/multixact.h | 4 -
src/include/access/slru.h | 86 ++++--
src/include/access/subtrans.h | 3 -
src/include/commands/async.h | 5 -
src/include/miscadmin.h | 8 +
src/include/storage/lwlock.h | 7 +
src/include/storage/predicate.h | 4 -
src/include/utils/guc_hooks.h | 11 +
src/test/modules/test_slru/test_slru.c | 35 +--
26 files changed, 1177 insertions(+), 353 deletions(-)

#2Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#1)
1 attachment(s)
Re: pgsql: Improve performance of subsystems on top of SLRU

On 2024-Feb-28, Alvaro Herrera wrote:

Improve performance of subsystems on top of SLRU

Coverity had the following complaint about this commit:

________________________________________________________________________________________________________
*** CID NNNNNNN: Control flow issues (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers()
1369 * and acquire the lock of the new bank.
1370 */
1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
1372 if (lock != prevlock)
1373 {
1374 if (prevlock != NULL)

CID 1592913: Control flow issues (DEADCODE)
Execution cannot reach this statement: "LWLockRelease(prevlock);".

1375 LWLockRelease(prevlock);
1376 LWLockAcquire(lock, LW_EXCLUSIVE);
1377 prevlock = lock;
1378 }
1379
1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);

And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.

I'll have a look at the other places where we use this "prevlock" coding
pattern tomorrow.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Attachments:

0001-rework-locking-code-in-GetMultiXactIdMembers.patchtext/x-diff; charset=utf-8Download
From a82fa380c7ff5a468ca920f4c06e626946bc8ecf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Sun, 3 Mar 2024 15:20:36 +0100
Subject: [PATCH] rework locking code in GetMultiXactIdMembers

Per Coverity
---
 src/backend/access/transam/multixact.c | 52 ++++++++++----------------
 1 file changed, 20 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9b81506145..ec446949a9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1250,14 +1250,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactOffset offset;
 	int			length;
 	int			truelength;
-	int			i;
 	MultiXactId oldestMXact;
 	MultiXactId nextMXact;
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
 	LWLock	   *lock;
-	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1364,18 +1362,9 @@ retry:
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
-	/*
-	 * If this page falls under a different bank, release the old bank's lock
-	 * and acquire the lock of the new bank.
-	 */
+	/* Acquire the bank lock for the page we need. */
 	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-	if (lock != prevlock)
-	{
-		if (prevlock != NULL)
-			LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-		prevlock = lock;
-	}
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
@@ -1410,17 +1399,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-			if (prevlock != lock)
+			newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (newlock != lock)
 			{
-				LWLockRelease(prevlock);
-				LWLockAcquire(lock, LW_EXCLUSIVE);
-				prevlock = lock;
+				LWLockRelease(lock);
+				LWLockAcquire(newlock, LW_EXCLUSIVE);
+				lock = newlock;
 			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
 		}
@@ -1432,8 +1423,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(prevlock);
-			prevlock = NULL;
+			LWLockRelease(lock);
+			lock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1442,14 +1433,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(prevlock);
-	prevlock = NULL;
-
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
 	truelength = 0;
 	prev_pageno = -1;
-	for (i = 0; i < length; i++, offset++)
+	for (int i = 0; i < length; i++, offset++)
 	{
 		TransactionId *xactptr;
 		uint32	   *flagsptr;
@@ -1462,18 +1450,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
-			if (lock != prevlock)
+			newlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (newlock != lock)
 			{
-				if (prevlock)
-					LWLockRelease(prevlock);
-				LWLockAcquire(lock, LW_EXCLUSIVE);
-				prevlock = lock;
+				LWLockRelease(lock);
+				LWLockAcquire(newlock, LW_EXCLUSIVE);
+				lock = newlock;
 			}
 
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
@@ -1499,8 +1488,7 @@ retry:
 		truelength++;
 	}
 
-	if (prevlock)
-		LWLockRelease(prevlock);
+	LWLockRelease(lock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
-- 
2.39.2

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: pgsql: Improve performance of subsystems on top of SLRU

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.

This is certainly simpler, but I notice that it holds the current
LWLock across the line

ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));

where the old code did not. Could the palloc take long enough that
holding the lock is bad?

Also, with this coding the "lock = NULL;" assignment just before
"goto retry" is a dead store. Not sure if Coverity or other static
analyzers would whine about that.

regards, tom lane

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#2)
Re: pgsql: Improve performance of subsystems on top of SLRU

On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Feb-28, Alvaro Herrera wrote:

Improve performance of subsystems on top of SLRU

Coverity had the following complaint about this commit:

________________________________________________________________________________________________________
*** CID NNNNNNN: Control flow issues (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers()
1369 * and acquire the lock of the new bank.
1370 */
1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
1372 if (lock != prevlock)
1373 {
1374 if (prevlock != NULL)

CID 1592913: Control flow issues (DEADCODE)
Execution cannot reach this statement: "LWLockRelease(prevlock);".

1375 LWLockRelease(prevlock);
1376 LWLockAcquire(lock, LW_EXCLUSIVE);
1377 prevlock = lock;
1378 }
1379
1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);

And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL. Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.

I'll have a look at the other places where we use this "prevlock" coding
pattern tomorrow.

+ /* Acquire the bank lock for the page we need. */
  lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
- if (lock != prevlock)
- {
- if (prevlock != NULL)
- LWLockRelease(prevlock);
- LWLockAcquire(lock, LW_EXCLUSIVE);
- prevlock = lock;
- }
+ LWLockAcquire(lock, LW_EXCLUSIVE);

This part is definitely an improvement.

I am not sure about the other changes, I mean that makes the code much
simpler but now we are not releasing the 'MultiXactOffsetCtl' related
bank lock, and later in the following loop, we are comparing that lock
against 'MultiXactMemberCtl' related bank lock. This makes code
simpler because now in the loop we are sure that we are always holding
the lock but I do not like comparing the bank locks for 2 different
SLRUs, although there is no problem as there would not be a common
lock address, anyway, I do not have any strong objection to what you
have done here.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#5Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#4)
3 attachment(s)
Re: pgsql: Improve performance of subsystems on top of SLRU

On 2024-Mar-03, Tom Lane wrote:

This is certainly simpler, but I notice that it holds the current
LWLock across the line

ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));

where the old code did not. Could the palloc take long enough that
holding the lock is bad?

Hmm, I guess most of the time it shouldn't be much of a problem (if the
length is small so we can palloc without malloc'ing); but it could be in
the worst case. But the fix is simple: just release the lock before.
There's no correctness argument for holding it all the way down. I was
just confused about how the original code worked.

Also, with this coding the "lock = NULL;" assignment just before
"goto retry" is a dead store. Not sure if Coverity or other static
analyzers would whine about that.

Oh, right. I removed that.

On 2024-Mar-04, Dilip Kumar wrote:

I am not sure about the other changes, I mean that makes the code much
simpler but now we are not releasing the 'MultiXactOffsetCtl' related
bank lock, and later in the following loop, we are comparing that lock
against 'MultiXactMemberCtl' related bank lock. This makes code
simpler because now in the loop we are sure that we are always holding
the lock but I do not like comparing the bank locks for 2 different
SLRUs, although there is no problem as there would not be a common
lock address,

True. This can be addressed in the same way Tom's first comment is:
just release the lock before entering the second loop, and setting lock
to NULL. This brings the code to a similar state than before, except
that the additional LWLock * variables are in a tighter scope. That's
in 0001.

Now, I had a look at the other users of slru.c and noticed in subtrans.c
that StartupSUBTRANS we have some duplicate code that I think could be
rewritten by making the "while" block test the condition at the end
instead of at the start; changed that in 0002. I'll leave this one for
later, because I want to add some test code for it -- right now it's
pretty much test-uncovered code.

I also looked at slru.c for uses of shared->bank_locks and noticed a
couple that could be made simpler. That's 0003 here.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)

Attachments:

v2-0001-rework-locking-code-in-GetMultiXactIdMembers.patchtext/x-diff; charset=utf-8Download
From a6be7cac5de83a36e056147f3e81bea2eb1096bd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Sun, 3 Mar 2024 15:20:36 +0100
Subject: [PATCH v2 1/3] rework locking code in GetMultiXactIdMembers

Per Coverity
---
 src/backend/access/transam/multixact.c | 53 +++++++++++---------------
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index cd476b94fa..83b578dced 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1247,14 +1247,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactOffset offset;
 	int			length;
 	int			truelength;
-	int			i;
 	MultiXactId oldestMXact;
 	MultiXactId nextMXact;
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
 	LWLock	   *lock;
-	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1361,18 +1359,9 @@ retry:
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
-	/*
-	 * If this page falls under a different bank, release the old bank's lock
-	 * and acquire the lock of the new bank.
-	 */
+	/* Acquire the bank lock for the page we need. */
 	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-	if (lock != prevlock)
-	{
-		if (prevlock != NULL)
-			LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-		prevlock = lock;
-	}
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
@@ -1407,17 +1396,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-			if (prevlock != lock)
+			newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (newlock != lock)
 			{
-				LWLockRelease(prevlock);
-				LWLockAcquire(lock, LW_EXCLUSIVE);
-				prevlock = lock;
+				LWLockRelease(lock);
+				LWLockAcquire(newlock, LW_EXCLUSIVE);
+				lock = newlock;
 			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
 		}
@@ -1429,8 +1420,7 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(prevlock);
-			prevlock = NULL;
+			LWLockRelease(lock);
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1439,14 +1429,14 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(prevlock);
-	prevlock = NULL;
+	LWLockRelease(lock);
+	lock = NULL;
 
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
 	truelength = 0;
 	prev_pageno = -1;
-	for (i = 0; i < length; i++, offset++)
+	for (int i = 0; i < length; i++, offset++)
 	{
 		TransactionId *xactptr;
 		uint32	   *flagsptr;
@@ -1459,18 +1449,20 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
-			if (lock != prevlock)
+			newlock = SimpleLruGetBankLock(MultiXactMemberCtl, pageno);
+			if (newlock != lock)
 			{
-				if (prevlock)
-					LWLockRelease(prevlock);
-				LWLockAcquire(lock, LW_EXCLUSIVE);
-				prevlock = lock;
+				if (lock)
+					LWLockRelease(lock);
+				LWLockAcquire(newlock, LW_EXCLUSIVE);
+				lock = newlock;
 			}
 
 			slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
@@ -1496,8 +1488,7 @@ retry:
 		truelength++;
 	}
 
-	if (prevlock)
-		LWLockRelease(prevlock);
+	LWLockRelease(lock);
 
 	/* A multixid with zero members should not happen */
 	Assert(truelength > 0);
-- 
2.39.2

v2-0002-Rework-redundant-loop-in-subtrans.c.patchtext/x-diff; charset=utf-8Download
From dfb077ba15399565967d2da4f1d17199bbf142d2 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon, 4 Mar 2024 11:49:01 +0100
Subject: [PATCH v2 2/3] Rework redundant loop in subtrans.c

---
 src/backend/access/transam/subtrans.c | 28 +++++----------------------
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index dc9566fb51..1455cdf54a 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -311,7 +311,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
-	LWLock	   *prevlock;
+	LWLock	   *prevlock = NULL;
 	LWLock	   *lock;
 
 	/*
@@ -324,19 +324,13 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
-	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
-	LWLockAcquire(prevlock, LW_EXCLUSIVE);
-	while (startPage != endPage)
+	do
 	{
 		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-		/*
-		 * Check if we need to acquire the lock on the new bank then release
-		 * the lock on the old bank and acquire on the new bank.
-		 */
 		if (prevlock != lock)
 		{
-			LWLockRelease(prevlock);
+			if (prevlock)
+				LWLockRelease(prevlock);
 			LWLockAcquire(lock, LW_EXCLUSIVE);
 			prevlock = lock;
 		}
@@ -346,20 +340,8 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
-	}
+	} while (startPage != endPage);
 
-	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-	/*
-	 * Check if we need to acquire the lock on the new bank then release the
-	 * lock on the old bank and acquire on the new bank.
-	 */
-	if (prevlock != lock)
-	{
-		LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-	}
-	(void) ZeroSUBTRANSPage(startPage);
 	LWLockRelease(lock);
 }
 
-- 
2.39.2

v2-0003-Simplify-coding-in-slru.c.patchtext/x-diff; charset=utf-8Download
From c3a05aae2d8acca3f216abea61a97e6a9326e699 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon, 4 Mar 2024 11:48:43 +0100
Subject: [PATCH v2 3/3] Simplify coding in slru.c

---
 src/backend/access/transam/slru.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index f774d285b7..6895266bf9 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -489,14 +489,14 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 				  TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
+	LWLock	   *banklock = SimpleLruGetBankLock(ctl, pageno);
 
-	Assert(LWLockHeldByMeInMode(SimpleLruGetBankLock(ctl, pageno), LW_EXCLUSIVE));
+	Assert(LWLockHeldByMeInMode(banklock, LW_EXCLUSIVE));
 
 	/* Outer loop handles restart if we must wait for someone else's I/O */
 	for (;;)
 	{
 		int			slotno;
-		int			bankno;
 		bool		ok;
 
 		/* See if page already is in memory; if not, pick victim slot */
@@ -539,10 +539,9 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 
 		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
 		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
-		bankno = SlotGetBankNumber(slotno);
 
 		/* Release bank lock while doing I/O */
-		LWLockRelease(&shared->bank_locks[bankno].lock);
+		LWLockRelease(banklock);
 
 		/* Do the read */
 		ok = SlruPhysicalReadPage(ctl, pageno, slotno);
@@ -551,7 +550,7 @@ SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 		SimpleLruZeroLSNs(ctl, slotno);
 
 		/* Re-acquire bank control lock and update page state */
-		LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+		LWLockAcquire(banklock, LW_EXCLUSIVE);
 
 		Assert(shared->page_number[slotno] == pageno &&
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
@@ -592,12 +591,13 @@ int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
+	LWLock	   *banklock = SimpleLruGetBankLock(ctl, pageno);
 	int			bankno = pageno & ctl->bank_mask;
 	int			bankstart = bankno * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
 
 	/* Try to find the page while holding only shared lock */
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_SHARED);
+	LWLockAcquire(banklock, LW_SHARED);
 
 	/* See if page is already in a buffer */
 	for (int slotno = bankstart; slotno < bankend; slotno++)
@@ -617,8 +617,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 	}
 
 	/* No luck, so switch to normal exclusive lock and do regular read */
-	LWLockRelease(&shared->bank_locks[bankno].lock);
-	LWLockAcquire(&shared->bank_locks[bankno].lock, LW_EXCLUSIVE);
+	LWLockRelease(banklock);
+	LWLockAcquire(banklock, LW_EXCLUSIVE);
 
 	return SimpleLruReadPage(ctl, pageno, true, xid);
 }
@@ -1167,7 +1167,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
-		Assert(LWLockHeldByMe(&shared->bank_locks[bankno].lock));
+		Assert(LWLockHeldByMe(SimpleLruGetBankLock(ctl, pageno)));
 
 		/* See if page already has a buffer assigned */
 		for (int slotno = 0; slotno < shared->num_slots; slotno++)
-- 
2.39.2

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#5)
1 attachment(s)
Re: pgsql: Improve performance of subsystems on top of SLRU

FWIW there's a stupid bug in 0002, which is fixed here. I'm writing a
simple test for it.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."

Attachments:

v3-0001-Rework-redundant-loop-in-subtrans.c.patchtext/x-diff; charset=utf-8Download
From ecf613092a3cbc4c5112d30af7eea55b668decec Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon, 4 Mar 2024 11:49:01 +0100
Subject: [PATCH v3] Rework redundant loop in subtrans.c

---
 src/backend/access/transam/subtrans.c | 29 +++++++--------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index dc9566fb51..50bb1d8cfc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -311,7 +311,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
-	LWLock	   *prevlock;
+	LWLock	   *prevlock = NULL;
 	LWLock	   *lock;
 
 	/*
@@ -324,42 +324,27 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
-	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
-	LWLockAcquire(prevlock, LW_EXCLUSIVE);
-	while (startPage != endPage)
+	for (;;)
 	{
 		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-		/*
-		 * Check if we need to acquire the lock on the new bank then release
-		 * the lock on the old bank and acquire on the new bank.
-		 */
 		if (prevlock != lock)
 		{
-			LWLockRelease(prevlock);
+			if (prevlock)
+				LWLockRelease(prevlock);
 			LWLockAcquire(lock, LW_EXCLUSIVE);
 			prevlock = lock;
 		}
 
 		(void) ZeroSUBTRANSPage(startPage);
+		if (startPage == endPage)
+			break;
+
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
 
-	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-	/*
-	 * Check if we need to acquire the lock on the new bank then release the
-	 * lock on the old bank and acquire on the new bank.
-	 */
-	if (prevlock != lock)
-	{
-		LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-	}
-	(void) ZeroSUBTRANSPage(startPage);
 	LWLockRelease(lock);
 }
 
-- 
2.39.2