Fix bank selection logic in SLRU

Started by Yura Sokolovabout 1 year ago16 messages
#1Yura Sokolov
y.sokolov@postgrespro.ru
1 attachment(s)

Good day, hackers.

Due to long discussion about SLRU size configuration [1]/messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com and code
evolution, non-serious bug were introduced:

- intermediate versions assumed cache size is always power of 2,
- therefore to determine bank simple binary-and with mask were used:

bankno = pageno & ctl->bank_mask;

- final merged version allows arbitrary cache size for every cache
type, but code for bankno were not fixed.

It is not critical bug, since it doesn't hurt correctness just
performance. In worst case only one bank will be used.

I attach the patch, that changes SlruCtlData->bank_mask to ->nbanks,
and changes calculation to modulo operation.

bankno = pageno % ctl->nbanks;

Probably, instead of modulo operation could be used multiplication:

bankno = ((uint64) murmurhash32(pageno) * ctl->nbanks) >> 32;

But I didn't bother to measure does it pay for or not.

[1]: /messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
/messages/by-id/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com

Regards,
Yura Sokolov aka funny-falcon

Attachments:

v1-0001-Fix-SLRU-bank-selection.patchtext/x-patch; charset=UTF-8; name=v1-0001-Fix-SLRU-bank-selection.patchDownload
From 4b438e67a79d77bf2caf3e5a0386bb7700d329ba Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Date: Tue, 10 Dec 2024 15:38:02 +0300
Subject: [PATCH v1] Fix SLRU bank selection.

Due to code evolution, nbanks is not power of two any more. But still
binary & were used to obtain bankno.
---
 src/backend/access/transam/slru.c | 6 +++---
 src/include/access/slru.h         | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index e7f73bf4275..afedb5c039f 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -343,7 +343,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
-	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
+	ctl->nbanks = nbanks;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -606,7 +606,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	LWLock	   *banklock = SimpleLruGetBankLock(ctl, pageno);
-	int			bankno = pageno & ctl->bank_mask;
+	int			bankno = pageno % ctl->nbanks;
 	int			bankstart = bankno * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
 
@@ -1177,7 +1177,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankno = pageno & ctl->bank_mask;
+		int			bankno = pageno % ctl->nbanks;
 		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd100..fabea220290 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -129,9 +129,9 @@ typedef struct SlruCtlData
 	SlruShared	shared;
 
 	/*
-	 * Bitmask to determine bank number from page number.
+	 * Number of banks to determine bank number from page number.
 	 */
-	bits16		bank_mask;
+	bits16		nbanks;
 
 	/*
 	 * If true, use long segment file names.  Otherwise, use short file names.
@@ -179,7 +179,7 @@ SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
 {
 	int			bankno;
 
-	bankno = pageno & ctl->bank_mask;
+	bankno = pageno % ctl->nbanks;
 	return &(ctl->shared->bank_locks[bankno].lock);
 }
 
-- 
2.43.0

#2Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Yura Sokolov (#1)
Re: Fix bank selection logic in SLRU

On 10 Dec 2024, at 15:39, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

It is not critical bug, since it doesn't hurt correctness just performance. In worst case only one bank will be used.

Ugh... yeah. IMO the problem is that we do not have protection that rejects values that are not power of 2.
If other values given system operates as if there are 2^(popcount(n)-1) banks. So if we just round down value to nearest power of 2 - we will help incorrectly configured systems to use proper amount of memory and keep performance of properly configured systems.

IMO doing modulo is not necessary. And hash function is pure waste of CPU cycles.

Best regards, Andrey Borodin.

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#2)
Re: Fix bank selection logic in SLRU

On Tue, Dec 10, 2024 at 6:32 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 10 Dec 2024, at 15:39, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

It is not critical bug, since it doesn't hurt correctness just performance. In worst case only one bank will be used.

Ugh... yeah. IMO the problem is that we do not have protection that rejects values that are not power of 2.
If other values given system operates as if there are 2^(popcount(n)-1) banks. So if we just round down value to nearest power of 2 - we will help incorrectly configured systems to use proper amount of memory and keep performance of properly configured systems.

IMO doing modulo is not necessary. And hash function is pure waste of CPU cycles.

IIUC, we do check that it should be in multiple of bank size (i.e.)
which is multiple of 2, right? Am I missing something?

/*
* Helper function for GUC check_hook to check whether slru buffers are in
* multiples of SLRU_BANK_SIZE.
*/
bool
check_slru_buffers(const char *name, int *newval)
{
/* Valid values are multiples of SLRU_BANK_SIZE */
if (*newval % SLRU_BANK_SIZE == 0)
return true;

GUC_check_errdetail("\"%s\" must be a multiple of %d", name,
SLRU_BANK_SIZE);
return false;
}

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#3)
Re: Fix bank selection logic in SLRU

On 10 Dec 2024, at 16:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:

IIUC, we do check that it should be in multiple of bank size (i.e.)
which is multiple of 2, right? Am I missing something?

Bank selection code assumes that number of buffers is power of 2.
If the number of buffers is not power of 2 - only subset of buffers will be used. In worst case, e.g. 65 buffers, everything will be buffered only in bank 64.

Best regards, Andrey Borodin.

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#4)
Re: Fix bank selection logic in SLRU

On Tue, 10 Dec 2024 at 7:24 PM, Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

On 10 Dec 2024, at 16:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:

IIUC, we do check that it should be in multiple of bank size (i.e.)
which is multiple of 2, right? Am I missing something?

Bank selection code assumes that number of buffers is power of 2.
If the number of buffers is not power of 2 - only subset of buffers will
be used. In worst case, e.g. 65 buffers, everything will be buffered only
in bank 64.

But why that would be the case? the acceptable values for GUC to configure
the slru buffers are in multiple of 16(bank size) we have that check to
check the GUC values.


Dilip

Show quoted text
#6Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#5)
Re: Fix bank selection logic in SLRU

On Tue, Dec 10, 2024 at 8:58 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Bank selection code assumes that number of buffers is power of 2.
If the number of buffers is not power of 2 - only subset of buffers will be used. In worst case, e.g. 65 buffers, everything will be buffered only in bank 64.

But why that would be the case? the acceptable values for GUC to configure the slru buffers are in multiple of 16(bank size) we have that check to check the GUC values.

"Must be a multiple of 16" and "must be a power of 2" are different
criteria. For example, 48 is a multiple of 16 but it is not a power of
2. If the code assumes that we have an actual power of 2, the check
you quoted in your previous email is insufficient.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#6)
Re: Fix bank selection logic in SLRU

On Tue, 10 Dec 2024 at 7:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 10, 2024 at 8:58 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Bank selection code assumes that number of buffers is power of 2.
If the number of buffers is not power of 2 - only subset of buffers

will be used. In worst case, e.g. 65 buffers, everything will be buffered
only in bank 64.

But why that would be the case? the acceptable values for GUC to

configure the slru buffers are in multiple of 16(bank size) we have that
check to check the GUC values.

"Must be a multiple of 16" and "must be a power of 2" are different
criteria. For example, 48 is a multiple of 16 but it is not a power of
2. If the code assumes that we have an actual power of 2, the check
you quoted in your previous email is insufficient.

Yeah I see it’s an issue. Thanks for clarifying.


Dilip

Show quoted text
#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Andrey M. Borodin (#2)
Re: Fix bank selection logic in SLRU

On Tue, 10 Dec 2024 at 6:32 PM, Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

On 10 Dec 2024, at 15:39, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

It is not critical bug, since it doesn't hurt correctness just

performance. In worst case only one bank will be used.

Ugh... yeah. IMO the problem is that we do not have protection that
rejects values that are not power of 2.
If other values given system operates as if there are 2^(popcount(n)-1)
banks. So if we just round down value to nearest power of 2 - we will help
incorrectly configured systems to use proper amount of memory and keep
performance of properly configured systems.

+1

IMO doing modulo is not necessary. And hash function is pure waste of CPU
cycles.

I agree


Dilip

Show quoted text
#9Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Dilip Kumar (#5)
Re: Fix bank selection logic in SLRU

On 10 Dec 2024, at 16:58, Dilip Kumar <dilipbalaut@gmail.com> wrote:

slru buffers are in multiple of 16(bank size)

Yes, my example with 64 buffers is incorrect.
The worst case scenario is when user configures 80, 144, 528 or 1040 buffers, but only two banks (32 buffers) will be used.

Best regards, Andrey Borodin.

#10Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Dilip Kumar (#8)
Re: Fix bank selection logic in SLRU

10.12.2024 17:07, Dilip Kumar wrote:

On Tue, 10 Dec 2024 at 6:32 PM, Andrey M. Borodin <x4mmm@yandex-team.ru
<mailto:x4mmm@yandex-team.ru>> wrote:

On 10 Dec 2024, at 15:39, Yura Sokolov <y.sokolov@postgrespro.ru

<mailto:y.sokolov@postgrespro.ru>> wrote:

It is not critical bug, since it doesn't hurt correctness just

performance. In worst case only one bank will be used.

Ugh... yeah. IMO the problem is that we do not have protection that
rejects values that are not power of 2.
If other values given system operates as if there are
2^(popcount(n)-1) banks. So if we just round down value to nearest
power of 2 - we will help incorrectly configured systems to use
proper amount of memory and keep performance of properly configured
systems.

+1

IMO doing modulo is not necessary. And hash function is pure waste
of CPU cycles.

I agree

I did some measurement "divide-modulo" vs "modulo using multiplication by
reciprocal" vs "simple binary and" using simple C program [0]https://gist.github.com/funny-falcon/173923b4fea7ffdf9e02595a0f99aa74.
(Note: loop is made to be dependent on previous iteration result so no
parallel computation happens).

Results on Ryzen 7 5825U:

$ time ./div 100000000 15 3 # binary and
real 0m0,943s
$ time ./div 100000000 15 1 # multiply by reciprocal
real 0m3,123s
$ time ./div 100000000 15 0 # just plain `%`
real 0m4,540s

It means:
- `&` takes 0.69ns
- `mult-rec` takes 2.94ns
- `%` takes 3.24ns.

I believe, compared to further memory accesses it could be count as
negligible.

(Certainly, it could be worse on some older processors. But I still doubt
it will be measurably worse on top of all other things SLRU does.)

[0]: https://gist.github.com/funny-falcon/173923b4fea7ffdf9e02595a0f99aa74

Regards,
Yura Sokolov aka funny-falcon

#11Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Yura Sokolov (#10)
Re: Fix bank selection logic in SLRU

On 19 Dec 2024, at 15:01, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

- `&` takes 0.69ns
- `mult-rec` takes 2.94ns
- `%` takes 3.24ns.

Thanks, Yura, for benchmarks and off-list conversation.
I’ve reproduced similar numbers on my Apple M2.
I agree that additional 3-4ns are negligible in case of SLRU access.

+ bits16 nbanks;

Perhaps, it’s not bits anymore. Also, is 64K banks ought enough for everybody?

Best regards, Andrey Borodin.

#12Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Andrey Borodin (#11)
Re: Fix bank selection logic in SLRU

19.12.2024 13:10, Andrey Borodin пишет:

On 19 Dec 2024, at 15:01, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

- `&` takes 0.69ns
- `mult-rec` takes 2.94ns
- `%` takes 3.24ns.

Thanks, Yura, for benchmarks and off-list conversation.
I’ve reproduced similar numbers on my Apple M2.
I agree that additional 3-4ns are negligible in case of SLRU access.

+ bits16 nbanks;

Perhaps, it’s not bits anymore. Also, is 64K banks ought enough for everybody?

Best regards, Andrey Borodin.

There are artificial limit currently:

#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
#define SLRU_BANK_BITSHIFT 4
#define SLRU_BANK_SIZE (1 << SLRU_BANK_BITSHIFT)

So, there's no more than 8192 banks at the moment.

But I believe, some customers will want to have more than 1GB of SLRU in
the future. (They do already actually)

----
Yura

#13Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Yura Sokolov (#12)
Re: Fix bank selection logic in SLRU

On 19 Dec 2024, at 15:37, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

So, there's no more than 8192 banks at the moment.

OK, but still current type indicates bitwise usage, while struct member is used as a number.

Best regards, Andrey Borodin.

#14Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Andrey M. Borodin (#13)
1 attachment(s)
Re: Fix bank selection logic in SLRU

19.12.2024 15:10, Andrey M. Borodin wrote:

On 19 Dec 2024, at 15:37, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

So, there's no more than 8192 banks at the moment.

OK, but still current type indicates bitwise usage, while struct member is used as a number.

Ok, I agree. Here's version with type change bits16 -> uint16.

Attachments:

v2-0001-Fix-SLRU-bank-selection.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-SLRU-bank-selection.patchDownload
From 4a16d64fe6eba042bfca7ff0b0d73ec8b749e6e0 Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.sokolov@postgrespro.ru>
Date: Tue, 10 Dec 2024 15:38:02 +0300
Subject: [PATCH v2] Fix SLRU bank selection.

Due to code evolution, nbanks is not power of two any more. But still
binary & were used to obtain bankno.
---
 src/backend/access/transam/slru.c | 6 +++---
 src/include/access/slru.h         | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index e7f73bf4275..afedb5c039f 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -343,7 +343,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
 	ctl->long_segment_names = long_segment_names;
-	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
+	ctl->nbanks = nbanks;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
 
@@ -606,7 +606,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int64 pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
 	LWLock	   *banklock = SimpleLruGetBankLock(ctl, pageno);
-	int			bankno = pageno & ctl->bank_mask;
+	int			bankno = pageno % ctl->nbanks;
 	int			bankstart = bankno * SLRU_BANK_SIZE;
 	int			bankend = bankstart + SLRU_BANK_SIZE;
 
@@ -1177,7 +1177,7 @@ SlruSelectLRUPage(SlruCtl ctl, int64 pageno)
 		int			bestinvalidslot = 0;	/* keep compiler quiet */
 		int			best_invalid_delta = -1;
 		int64		best_invalid_page_number = 0;	/* keep compiler quiet */
-		int			bankno = pageno & ctl->bank_mask;
+		int			bankno = pageno % ctl->nbanks;
 		int			bankstart = bankno * SLRU_BANK_SIZE;
 		int			bankend = bankstart + SLRU_BANK_SIZE;
 
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 97e612cd100..648e202ff54 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -129,9 +129,9 @@ typedef struct SlruCtlData
 	SlruShared	shared;
 
 	/*
-	 * Bitmask to determine bank number from page number.
+	 * Number of banks to determine bank number from page number.
 	 */
-	bits16		bank_mask;
+	uint16		nbanks;
 
 	/*
 	 * If true, use long segment file names.  Otherwise, use short file names.
@@ -179,7 +179,7 @@ SimpleLruGetBankLock(SlruCtl ctl, int64 pageno)
 {
 	int			bankno;
 
-	bankno = pageno & ctl->bank_mask;
+	bankno = pageno % ctl->nbanks;
 	return &(ctl->shared->bank_locks[bankno].lock);
 }
 
-- 
2.43.0

#15Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Yura Sokolov (#14)
Re: Fix bank selection logic in SLRU

On 19 Dec 2024, at 20:48, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

Here's version with type change bits16 -> uint16

Thanks! This version looks good to me. I’ll mark the CF entry as RfC.

Best regards, Andrey Borodin.

#16Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrey Borodin (#15)
Re: Fix bank selection logic in SLRU

On 2024-Dec-27, Andrey Borodin wrote:

On 19 Dec 2024, at 20:48, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:

Here's version with type change bits16 -> uint16

Thanks! This version looks good to me. I’ll mark the CF entry as RfC.

Thank you, I have pushed this.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL. This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php