MAX_BACKENDS size (comment accuracy)

Started by Jacob Brazeal12 months ago19 messages
#1Jacob Brazeal
jacob.brazeal@gmail.com

Hello all,

In lwlocks.c, we have the following comment, related to LWLock state:

*/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine.
*/#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))*

However, MAX_BACKENDS is set to 2^18-1. Here is the comment in postmaster.h:

*/* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width
reserved * for buffer references in buf_internals.h. This limitation could
be lifted * by using a 64bit state; but it's unlikely to be worthwhile as
2^18-1 * backends exceed currently realistic configurations. Even if that
limitation * were removed, we still could not a) exceed 2^23-1 because
inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4
because some places * compute 4*MaxBackends without any overflow check.
This is rechecked in the * relevant GUC check hooks and in
RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF*

2^23-1 is noted as an additional upper limit, but I wonder if it'd be
correct to update the comment in lwlocks.c to

*/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */*

I'm not sure if this could lead to us actually saving some bits in the
lwlock state, or if we could do anything useful with them anyway.

Jacob

#2Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#1)
1 attachment(s)
Re: MAX_BACKENDS size (comment accuracy)

Thinking a bit further about this, the purpose of the LW_SHARED_MASK
section of the state is to count the number of lock-sharers. Thus, we only
care about the actual number of backends (up to 2^18-1) here and not the
size of the ProcNumber data type. So I do think the comment should read
2^18-1 and not 2^23-1. Here is a patch to that effect.

On Sat, Jan 25, 2025 at 3:21 PM Jacob Brazeal <jacob.brazeal@gmail.com>
wrote:

Show quoted text

Hello all,

In lwlocks.c, we have the following comment, related to LWLock state:

*/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine.
*/#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))*

However, MAX_BACKENDS is set to 2^18-1. Here is the comment in
postmaster.h:

*/* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width
reserved * for buffer references in buf_internals.h. This limitation could
be lifted * by using a 64bit state; but it's unlikely to be worthwhile as
2^18-1 * backends exceed currently realistic configurations. Even if that
limitation * were removed, we still could not a) exceed 2^23-1 because
inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4
because some places * compute 4*MaxBackends without any overflow check.
This is rechecked in the * relevant GUC check hooks and in
RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF*

2^23-1 is noted as an additional upper limit, but I wonder if it'd be
correct to update the comment in lwlocks.c to

*/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */*

I'm not sure if this could lead to us actually saving some bits in the
lwlock state, or if we could do anything useful with them anyway.

Jacob

Attachments:

v0-backend-size.patchapplication/octet-stream; name=v0-backend-size.patchDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f558ffea1..d3a2619072 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -99,7 +99,7 @@
 #define LW_VAL_SHARED				1
 
 #define LW_LOCK_MASK				((uint32) ((1 << 25)-1))
-/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
+/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */
 #define LW_SHARED_MASK				((uint32) ((1 << 24)-1))
 
 StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
#3Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#2)
Re: MAX_BACKENDS size (comment accuracy)

While we are on the topic of comments from lwlock.c, there is one other one
that confused me, in LWLockWaitListLock:

* /* and then spin without atomic operations until lock is released */ {
SpinDelayStatus delayStatus; init_local_spin_delay(&delayStatus); while
(old_state & LW_FLAG_LOCKED) { perform_spin_delay(&delayStatus); old_state
= pg_atomic_read_u32(&lock->state); }#ifdef LWLOCK_STATS delays +=
delayStatus.delays;#endif finish_spin_delay(&delayStatus); }*

It seems that we *are* using an atomic operation in the loop (though, no
compare-and-set, etc.) I might be mis-reading the intent of the comment,
but I'm curious if there's a way to reword it, too.

On Sat, Jan 25, 2025 at 4:06 PM Jacob Brazeal <jacob.brazeal@gmail.com>
wrote:

Show quoted text

Thinking a bit further about this, the purpose of the LW_SHARED_MASK
section of the state is to count the number of lock-sharers. Thus, we only
care about the actual number of backends (up to 2^18-1) here and not the
size of the ProcNumber data type. So I do think the comment should read
2^18-1 and not 2^23-1. Here is a patch to that effect.

On Sat, Jan 25, 2025 at 3:21 PM Jacob Brazeal <jacob.brazeal@gmail.com>
wrote:

Hello all,

In lwlocks.c, we have the following comment, related to LWLock state:

*/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine.
*/#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))*

However, MAX_BACKENDS is set to 2^18-1. Here is the comment in
postmaster.h:

*/* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width
reserved * for buffer references in buf_internals.h. This limitation could
be lifted * by using a 64bit state; but it's unlikely to be worthwhile as
2^18-1 * backends exceed currently realistic configurations. Even if that
limitation * were removed, we still could not a) exceed 2^23-1 because
inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4
because some places * compute 4*MaxBackends without any overflow check.
This is rechecked in the * relevant GUC check hooks and in
RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF*

2^23-1 is noted as an additional upper limit, but I wonder if it'd be
correct to update the comment in lwlocks.c to

*/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine.
*/*

I'm not sure if this could lead to us actually saving some bits in the
lwlock state, or if we could do anything useful with them anyway.

Jacob

#4Andres Freund
andres@anarazel.de
In reply to: Jacob Brazeal (#2)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-01-25 16:06:29 -0800, Jacob Brazeal wrote:

In lwlocks.c, we have the following comment, related to LWLock state:

*/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine.
*/#define LW_SHARED_MASK ((uint32) ((1 << 24)-1))*

However, MAX_BACKENDS is set to 2^18-1. Here is the comment in
postmaster.h:

*/* * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width
reserved * for buffer references in buf_internals.h. This limitation could
be lifted * by using a 64bit state; but it's unlikely to be worthwhile as
2^18-1 * backends exceed currently realistic configurations. Even if that
limitation * were removed, we still could not a) exceed 2^23-1 because
inval.c stores * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4
because some places * compute 4*MaxBackends without any overflow check.
This is rechecked in the * relevant GUC check hooks and in
RegisterBackgroundWorker(). */#define MAX_BACKENDS 0x3FFFF*

2^23-1 is noted as an additional upper limit, but I wonder if it'd be
correct to update the comment in lwlocks.c to

Thinking a bit further about this, the purpose of the LW_SHARED_MASK
section of the state is to count the number of lock-sharers. Thus, we only
care about the actual number of backends (up to 2^18-1) here and not the
size of the ProcNumber data type. So I do think the comment should read
2^18-1 and not 2^23-1. Here is a patch to that effect.

At the time that comment was written MAX_BACKENDS was 2^23-1. Alexander and I
lowered MAX_BACKENDS in 48354581a49c to 2^18-1 and apparently didn't think
about the comment in lwlock.h.

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: 2016-04-10 20:12:32 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.
...

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

*/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */*

I'm not sure if this could lead to us actually saving some bits in the
lwlock state, or if we could do anything useful with them anyway.

It's not quite enough bits to be useful I think. If we could free 16 bits (or
we defined LWLock.tranche to be narrower), we could store the tranche as part
of the atomic, and save 4 bytes (2 bytes of alignment padding, 2 bytes for the
tranche). But unless we reduce the size of the tranche field a decent bit
there's not enough space.

I'd really like to reduce the size of struct LWLock, but I think that'll need
a bit more changes. I think we should use a single 64bit atomic and store the
list of waiters inside the atomic.

flags: 3
exclusively locked: 1 bit
share locks: 18 bits
head of wait list: 18 bits (proc number offset)
tail of wait list: 18 bits (proc number offset)
= 58

That doesn't leave enough space for the tranche unfortunately. But if we
reduced MAX_BACKENDS to 2^16-1 and reduced the size of the tranche to to 12
bits, it'd work!

I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient - we're
far from that being a realistic limit. Halfing the size of LWLock and laying
the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

Anyway, that's really just a periodic rant / project suggestion...

diff --git a/src/backend/storage/lmgr/lwlock.c
b/src/backend/storage/lmgr/lwlock.c index 2f558ffea1..d3a2619072 100644 ---
a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c
@@ -99,7 +99,7 @@ #define LW_VAL_SHARED 1
#define LW_LOCK_MASK				((uint32) ((1 << 25)-1))
-/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
+/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */
#define LW_SHARED_MASK				((uint32) ((1 << 24)-1))

StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,

I'm inclined to think that we should adjust the various constants at the same
time as the comment? It's imo somewhat weird to have bits LW_SHARED_MASK that
can never be set...

I wonder if we should instead define LW_VAL_EXCLUSIVE and LW_SHARED_MASK using
MAX_BACKENDS and have a static assertion ensuring it doesn't overlap with flag
bits? That way we don't have two sets of defines to keep in sync.

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Jacob Brazeal (#3)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-01-25 23:35:51 -0800, Jacob Brazeal wrote:

While we are on the topic of comments from lwlock.c, there is one other one
that confused me, in LWLockWaitListLock:

* /* and then spin without atomic operations until lock is released */ {
SpinDelayStatus delayStatus; init_local_spin_delay(&delayStatus); while
(old_state & LW_FLAG_LOCKED) { perform_spin_delay(&delayStatus); old_state
= pg_atomic_read_u32(&lock->state); }#ifdef LWLOCK_STATS delays +=
delayStatus.delays;#endif finish_spin_delay(&delayStatus); }*

It seems that we *are* using an atomic operation in the loop (though, no
compare-and-set, etc.) I might be mis-reading the intent of the comment,
but I'm curious if there's a way to reword it, too.

It's not really an atomic operation. It's just reading an atomic variable
(which just guarantees that the compiler isn't eliding the read and that the
read isn't torn). Personally I don't think there's a need to rephrase the
comment, but I probably wrote it, so take that with a grain of salt.

Greetings,

Andres Freund

PS: FYI, this list values properly quoting messages instead of replying ontop
of the entire quoted messages.

#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
3 attachment(s)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-01-26 13:10:35 -0500, Andres Freund wrote:

On 2025-01-25 16:06:29 -0800, Jacob Brazeal wrote:

diff --git a/src/backend/storage/lmgr/lwlock.c
b/src/backend/storage/lmgr/lwlock.c index 2f558ffea1..d3a2619072 100644 ---
a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c
@@ -99,7 +99,7 @@ #define LW_VAL_SHARED 1
#define LW_LOCK_MASK				((uint32) ((1 << 25)-1))
-/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
+/* Must be greater than MAX_BACKENDS - which is 2^18-1, so we're fine. */
#define LW_SHARED_MASK				((uint32) ((1 << 24)-1))

StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,

I'm inclined to think that we should adjust the various constants at the same
time as the comment? It's imo somewhat weird to have bits LW_SHARED_MASK that
can never be set...

I wonder if we should instead define LW_VAL_EXCLUSIVE and LW_SHARED_MASK using
MAX_BACKENDS and have a static assertion ensuring it doesn't overlap with flag
bits? That way we don't have two sets of defines to keep in sync.

Started to write a patch to do what I described. Increased MAX_BACKENDS to
check if the assertions work. They did - but I noticed that we do *not* have
such assertions to check the buf_internals.h assumptions aren't violated.

Adding them would currently require including postmaster.h into
buf_internals.h, which seems somewhat gross. I wonder if we should move the
define to pg_config_manual.h? Seems at least somewhat more appropriate than
postmaster.h? Only allows to remove two postmaster.h includes though, due to
all things like ClientAuthInProgress (listed as a GUC, but isn't) and GUCs
declared in postmaster.h. So maybe it's not worth it.

Attached is a draft patch series.

Thoughts?

Andres Freund

Attachments:

v1-0001-Move-MAX_BACKENDS-to-pg_config_manual.h.patchtext/x-diff; charset=us-asciiDownload
From b4cf37ce1d32bc251d77bfa2976324946354202c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 26 Jan 2025 14:10:58 -0500
Subject: [PATCH v1 1/3] Move MAX_BACKENDS to pg_config_manual.h

MAX_BACKENDS influences many things besides postmaster. I just noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.

Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4
---
 src/include/pg_config_manual.h      | 12 ++++++++++++
 src/include/postmaster/postmaster.h | 12 ------------
 src/backend/storage/lmgr/lwlock.c   |  1 -
 src/backend/utils/adt/xid8funcs.c   |  1 -
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 449e50bd78c..9874302946e 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -42,6 +42,18 @@
  */
 #define FUNC_MAX_ARGS		100
 
+/*
+ * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
+ * for buffer references in buf_internals.h.  This limitation could be lifted
+ * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
+ * backends exceed currently realistic configurations. Even if that limitation
+ * were removed, we still could not a) exceed 2^23-1 because inval.c stores
+ * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
+ * compute 4*MaxBackends without any overflow check.  This is rechecked in the
+ * relevant GUC check hooks and in RegisterBackgroundWorker().
+ */
+#define MAX_BACKENDS	0x3FFFF
+
 /*
  * When creating a product derived from PostgreSQL with changes that cause
  * incompatibilities for loadable modules, it is recommended to change this
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 188a06e2379..e12af62a370 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -126,18 +126,6 @@ extern PMChild *AllocDeadEndChild(void);
 extern bool ReleasePostmasterChildSlot(PMChild *pmchild);
 extern PMChild *FindPostmasterChildByPid(int pid);
 
-/*
- * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
- * for buffer references in buf_internals.h.  This limitation could be lifted
- * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
- * backends exceed currently realistic configurations. Even if that limitation
- * were removed, we still could not a) exceed 2^23-1 because inval.c stores
- * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
- * compute 4*MaxBackends without any overflow check.  This is rechecked in the
- * relevant GUC check hooks and in RegisterBackgroundWorker().
- */
-#define MAX_BACKENDS	0x3FFFF
-
 /*
  * These values correspond to the special must-be-first options for dispatching
  * to various subprograms.  parse_dispatch_option() can be used to convert an
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f558ffea14..328b1bab659 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -80,7 +80,6 @@
 #include "pg_trace.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
-#include "postmaster/postmaster.h"
 #include "storage/proc.h"
 #include "storage/proclist.h"
 #include "storage/spin.h"
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 20b28b2528c..d505e3fcf57 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -32,7 +32,6 @@
 #include "lib/qunique.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
-#include "postmaster/postmaster.h"
 #include "storage/lwlock.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
-- 
2.48.1.76.g4e746b1a31.dirty

v1-0002-bufmgr-Assert-that-MAX_BACKENDS-is-compatible-wit.patchtext/x-diff; charset=us-asciiDownload
From 02fbbbf691ab28f267f5426bb4f3ba7c8cd950dc Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 26 Jan 2025 14:13:57 -0500
Subject: [PATCH v1 2/3] bufmgr: Assert that MAX_BACKENDS is compatible with
 BufferDesc.state

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/include/storage/buf_internals.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 1a65342177d..d93493f746b 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -39,8 +39,9 @@
  *
  * The definition of buffer state components is below.
  */
+#define BUF_REFCOUNT_BITS 18
 #define BUF_REFCOUNT_ONE 1
-#define BUF_REFCOUNT_MASK ((1U << 18) - 1)
+#define BUF_REFCOUNT_MASK ((1U << BUF_REFCOUNT_BITS) - 1)
 #define BUF_USAGECOUNT_MASK 0x003C0000U
 #define BUF_USAGECOUNT_ONE (1U << 18)
 #define BUF_USAGECOUNT_SHIFT 18
@@ -77,6 +78,9 @@
  */
 #define BM_MAX_USAGE_COUNT	5
 
+StaticAssertDecl(MAX_BACKENDS <= ((1 << BUF_REFCOUNT_BITS) - 1),
+				 "MAX_BACKENDS is too big for BUF_REFCOUNT_BITS");
+
 /*
  * Buffer tag identifies which disk block the buffer contains.
  *
-- 
2.48.1.76.g4e746b1a31.dirty

v1-0003-WIP-Base-LWLock-limits-directly-on-MAX_BACKENDS.patchtext/x-diff; charset=us-asciiDownload
From 80e4d720dddef6d354e8d6fa523b08c29f9fcc75 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 26 Jan 2025 14:15:31 -0500
Subject: [PATCH v1 3/3] WIP: Base LWLock limits directly on MAX_BACKENDS

Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a49c, but the comment in lwlock.c wasn't updated.

Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.

As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.

Reported-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
---
 src/include/pg_config_manual.h    |  4 +++-
 src/backend/storage/lmgr/lwlock.c | 26 ++++++++++++++++++--------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 9874302946e..55db22eb095 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -51,8 +51,10 @@
  * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
  * compute 4*MaxBackends without any overflow check.  This is rechecked in the
  * relevant GUC check hooks and in RegisterBackgroundWorker().
+ *
+ * Various places currently assume that MAX_BACKENDS + 1 is a power of two.
  */
-#define MAX_BACKENDS	0x3FFFF
+#define MAX_BACKENDS	0x3FFFFU
 
 /*
  * When creating a product derived from PostgreSQL with changes that cause
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 328b1bab659..eea51ec601c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -90,18 +90,28 @@
 #endif
 
 
-#define LW_FLAG_HAS_WAITERS			((uint32) 1 << 30)
-#define LW_FLAG_RELEASE_OK			((uint32) 1 << 29)
-#define LW_FLAG_LOCKED				((uint32) 1 << 28)
+#define LW_FLAG_HAS_WAITERS			((uint32) 1 << 31)
+#define LW_FLAG_RELEASE_OK			((uint32) 1 << 30)
+#define LW_FLAG_LOCKED				((uint32) 1 << 29)
+#define LW_FLAG_BITS				3
+#define LW_FLAG_MASK				(((1<<LW_FLAG_BITS)-1)<<(32-LW_FLAG_BITS))
 
-#define LW_VAL_EXCLUSIVE			((uint32) 1 << 24)
+/* assumed MAX_BACKENDS is a (power of 2) - 1, checked below */
+#define LW_VAL_EXCLUSIVE			(MAX_BACKENDS + 1)
 #define LW_VAL_SHARED				1
 
-#define LW_LOCK_MASK				((uint32) ((1 << 25)-1))
-/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
-#define LW_SHARED_MASK				((uint32) ((1 << 24)-1))
+/* already (power of 2)-1, i.e. suitable for a mask */
+#define LW_SHARED_MASK				MAX_BACKENDS
+#define LW_LOCK_MASK				(MAX_BACKENDS | LW_VAL_EXCLUSIVE)
 
-StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
+
+StaticAssertDecl(((MAX_BACKENDS + 1) & MAX_BACKENDS) == 0,
+				 "MAX_BACKENDS + 1 needs to be a power of 2");
+
+StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
+				 "MAX_BACKENDS and LW_FLAG_MASK overlap");
+
+StaticAssertDecl(MAX_BACKENDS < LW_VAL_EXCLUSIVE,
 				 "MAX_BACKENDS too big for lwlock.c");
 
 /*
-- 
2.48.1.76.g4e746b1a31.dirty

#7Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#1)
Re: MAX_BACKENDS size (comment accuracy)

I find I didn't send the previous reply to the mailing list, so I'll copy
it here.
---
The patch series looks good. It looks like this currently leaves 10 bits of
unused space (bits 20 - 29) in the state.

StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
"MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the
LW_VAL_EXCLUSIVE bit does not overlap.

I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient -

we're

far from that being a realistic limit. Halfing the size of LWLock and

laying

the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

Neat. The current implementation of queuing does seem pretty heavy, and I'd
have time to work on a lock-free version. It seems like the waitlist state
itself could be managed similarly to LWLockAttemptLock, with an atomic
compare-and-set. I'm not quite sure how to manage the full proclist queue,
since only the head and tail would actually be part of the LWLock; would we
need to do something like copy the whole list, add our process to the
copied queue, and then swap out the reference to the new list in the LWLock?

PS: FYI, this list values properly quoting messages instead of replying

on top

of the entire quoted messages.

Oops, thank you for the heads up. Hopefully this reply is formatted
correctly, I'm still getting the hang of things.

Regards,
Jacob

On Sun, Jan 26, 2025 at 12:41 PM Jacob Brazeal <jacob.brazeal@gmail.com>
wrote:

Show quoted text

The patch series looks good. It looks like this currently leaves 10 bits
of unused space (bits 20 - 29) in the state.

StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
"MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure
the LW_VAL_EXCLUSIVE bit does not overlap.

I continue to believe that MAX_BACKENDS of 2^16-1 would be sufficient -

we're

far from that being a realistic limit. Halfing the size of LWLock and

laying

the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

Neat. The current implementation of queuing does seem pretty heavy, and
I'd have time to work on a lock-free version. It seems like the waitlist
state itself could be managed similarly to LWLockAttemptLock, with an
atomic compare-and-set. I'm not quite sure how to manage the full proclist
queue, since only the head and tail would actually be part of the LWLock;
would we need to do something like copy the whole list, add our process to
the copied queue, and then swap out the reference to the new list in the
LWLock?

PS: FYI, this list values properly quoting messages instead of replying

on top

of the entire quoted messages.

Oops, thank you for the heads up. Hopefully this reply is formatted
correctly, I'm still getting the hang of things.

Regards,
Jacob

#8Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#7)
Re: MAX_BACKENDS size (comment accuracy)

Looking at v1-0003-WIP-Base-LWLock-limits-directly-on-MAX_BACKENDS.patch,
I'm curious about the following assert;

#define LW_VAL_EXCLUSIVE (MAX_BACKENDS + 1)

...

StaticAssertDecl(MAX_BACKENDS < LW_VAL_EXCLUSIVE,

"MAX_BACKENDS too big for lwlock.c");

Since LW_VAL_EXCLUSIVE is already defined as MAX_BACKENDS + 1, is this
basically just checking for an integer overflow?

#9Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#8)
Re: MAX_BACKENDS size (comment accuracy)

I had a typo earlier: I should have said:

StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
"MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that LW_LOCK_MASK & LW_FLAG_MASK == 0? To also ensure
the LW_VAL_EXCLUSIVE bit does not overlap.

#10Jacob Brazeal
jacob.brazeal@gmail.com
In reply to: Jacob Brazeal (#9)
Re: MAX_BACKENDS size (comment accuracy)

Halfing the size of LWLock and laying
the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

BTW, I spent a week or two researching the lock-free queue idea,
specifically looking at what it might look like to have a Michael-Scott
type lock-free queue adapted to the waitlist. In the end, it seems like a
fairly challenging project, and probably beyond my powers, but I'm very
curious if you had some ideas around it that I'm missing. Here is a summary
of my digging:

The proclist has several advantages. We can always store a waitlist entry
for a given process in the same spot in shared memory, so there's no memory
allocation or cleanup to worry about. It's a doubly-linked list and hence
easy to remove items from the middle of the list (given a lock is acquired).

A Michael-Scott style lock-free queue would require giving up these
advantages:
* When you pop a node from a queue, it is repurposed into a temporary
"dummy" node at the head of the queue. This means that if we want to add a
process to a new queue, we can't reuse the memory from its previous queue
node, because it might be in use as a dummy node. So we can't store
waitlist entries in PGPROCs anymore; we'd probably want some kind of
freelist. This also means that we need to eventually free nodes that have
been popped, and the next time they're used, it might be by a different
process.
* The queue only supports popping from the head. So we can't directly
support the current logic, which can skip some nodes in the waitlist to
wakeup others. Two possibles workarounds I see: 1) split the waitlist into
3 different queues, one for each type (shared, exclusive, wait for val) -
or have a lock-free "waitlist for the waitlist" that accumulates new
entries and then we take out a lock when we actually want to release nodes.

Finally, we'd need to set up some mechanism for safely freeing queue nodes
when we're done with them, like hazard pointers or maybe epoch-based
reclamation, which looks like a slightly better fit to me for our case, but
both mechanisms are fairly tricky and I think we'd have to seriously
consider finding some existing implementation that's well-tested and
compatibly licensed instead of writing our own. (Or, maybe writing and
testing our own version is the actual heart of this project.)

So, as I say, in the end this feels beyond my powers, but it was very
interesting to look into and I'm curious if you had some notes lying around
on it.

Regards,
Jacob

#11Andres Freund
andres@anarazel.de
In reply to: Jacob Brazeal (#10)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-02-09 12:41:58 -0800, Jacob Brazeal wrote:

Halfing the size of LWLock and laying
the ground work for making the wait-list lock-free imo would be very well
worth the reduction in an unrealistic limit...

BTW, I spent a week or two researching the lock-free queue idea,
specifically looking at what it might look like to have a Michael-Scott
type lock-free queue adapted to the waitlist. In the end, it seems like a
fairly challenging project, and probably beyond my powers, but I'm very
curious if you had some ideas around it that I'm missing. Here is a summary
of my digging:

The proclist has several advantages. We can always store a waitlist entry
for a given process in the same spot in shared memory, so there's no memory
allocation or cleanup to worry about. It's a doubly-linked list and hence
easy to remove items from the middle of the list (given a lock is acquired).

A Michael-Scott style lock-free queue would require giving up these
advantages:
* When you pop a node from a queue, it is repurposed into a temporary
"dummy" node at the head of the queue. This means that if we want to add a
process to a new queue, we can't reuse the memory from its previous queue
node, because it might be in use as a dummy node. So we can't store
waitlist entries in PGPROCs anymore; we'd probably want some kind of
freelist. This also means that we need to eventually free nodes that have
been popped, and the next time they're used, it might be by a different
process.
* The queue only supports popping from the head. So we can't directly
support the current logic, which can skip some nodes in the waitlist to
wakeup others. Two possibles workarounds I see: 1) split the waitlist into
3 different queues, one for each type (shared, exclusive, wait for val) -
or have a lock-free "waitlist for the waitlist" that accumulates new
entries and then we take out a lock when we actually want to release nodes.

Finally, we'd need to set up some mechanism for safely freeing queue nodes
when we're done with them, like hazard pointers or maybe epoch-based
reclamation, which looks like a slightly better fit to me for our case, but
both mechanisms are fairly tricky and I think we'd have to seriously
consider finding some existing implementation that's well-tested and
compatibly licensed instead of writing our own. (Or, maybe writing and
testing our own version is the actual heart of this project.)

Yea, I think a general lock free doubly linked list would be hard.

So, as I say, in the end this feels beyond my powers, but it was very
interesting to look into and I'm curious if you had some notes lying around
on it.

As I was suggesting upthread, I was thinking that we would continue to use
something like proclist, except that we'd "inline" the head and tail pointers
into a, now 64bit, state variable.

Generic lock free queues are decidedly nontrivial to write using
compare-exchange etc as a primitive, primarily due to ABA issues. Addressing
that in turn will often require some safe memory reclamation mechanism. But I
think we actually have a considerably simpler case here than a general
lock-free doubly linked list.

E.g.:

1) While waiting for an lwlock, a process will not do anything else (there is
is the corner case of enqueueing while the lock is concurrently released
however)

2) Memory lifetime is not a concern from a need-to-free POV (it could be a
concern from an ABA POV)

3) The set of potential list member is strictly limited and a fairly small
number (2^18)

This means we can represent prev/next in list members as a single 64bit
value that that can be CASed without requiring 128bit CAS (which scales
very badly)

4) The lock list does not have to be completely lock & wait free, it's
sufficient for common cases to be lock free.

We e.g. can do things like have a small ABA avoidance counter and fall back
to something like the existing spinlock whenver it overflows.

Another aspect that could be interesting is that we effectively have two
different wait queues. One for shared lockers and one for exclusive lockers. I
wonder if we could benefit from representing them as two singly-linked list,
instead of one doubly-linked list. There's also LW_WAIT_UNTIL_FREE, but
that's basically an exclusive lock and I think we'll get rid of it before
long.

One annoying thing is that we afaict can't rely on pointers to lwlocks to be
stable across processes, that makes some schemes harder.

I started to write up a scheme using the above restrictions, but it was taking
a bit, and I really gotta focus on stuff that can make it into the next
release...

Greetings,

Andres Freund

#12Andres Freund
andres@anarazel.de
In reply to: Jacob Brazeal (#7)
4 attachment(s)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-01-26 12:55:15 -0800, Jacob Brazeal wrote:

The patch series looks good.

Thanks for reviewing! And sorry for not getting back to you about your review
comments earlier.

StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
"MAX_BACKENDS and LW_FLAG_MASK overlap");

Should this check that MAX_BACKENDS & LW_LOCK_MASK == 0? To also ensure the
LW_VAL_EXCLUSIVE bit does not overlap.

Yes, I think you're right.

On 2025-01-26 12:57:50 -0800, Jacob Brazeal wrote:

Looking at v1-0003-WIP-Base-LWLock-limits-directly-on-MAX_BACKENDS.patch,
I'm curious about the following assert;

#define LW_VAL_EXCLUSIVE (MAX_BACKENDS + 1)

...

StaticAssertDecl(MAX_BACKENDS < LW_VAL_EXCLUSIVE,

"MAX_BACKENDS too big for lwlock.c");

Since LW_VAL_EXCLUSIVE is already defined as MAX_BACKENDS + 1, is this
basically just checking for an integer overflow?

You're right, it's redundant. I think I added that in an earlier version and
then didn't remove it.

Updated patches attached.

I didn't really like moving MAX_BACKENDS to pg_config_manual.h, there are too
many constraints on it. I moved it to procnumber.h. That's maybe not perfect,
but seems a bit better? I also introduced MAX_BACKENDS_BITS, so that it's
easier to check against when asserting restrictions on bit space.

I also added a patch from a different thread, to remove a bunch of the magic
constants in buf_internals.h, as it felt awkward to assert the refcount bits
alone were compatible with MAX_BACKENDS, when a 'standalone' 18 was used in a
bunch of other places. As Heikki had already reviewed that quite a while ago,
I'll push that soon.

I chose to not directly base BUF_REFCOUNT_BITS directly on MAX_BACKEND_BITS,
as it's ok to lower MAX_BACKEND_BITS without adjusting BUF_REFCOUNT_BITS. But
I went back and forth on that one.

Greetings,

Andres Freund

Attachments:

v2-0001-Move-MAX_BACKENDS-to-procnumber.h.patchtext/x-diff; charset=us-asciiDownload
From 2fd801aad661d8c8de816f10c014edd7705dd3f0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 26 Jan 2025 14:10:58 -0500
Subject: [PATCH v2 1/4] Move MAX_BACKENDS to procnumber.h

MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.

While at that, add MAX_BACKENDS_BITS, as that's useful in various places for
static assertions (to be added in subsequent commits).

Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4
---
 src/include/postmaster/postmaster.h | 12 ------------
 src/include/storage/procnumber.h    | 13 +++++++++++++
 src/backend/storage/lmgr/lwlock.c   |  2 +-
 src/backend/utils/adt/xid8funcs.c   |  2 +-
 src/backend/utils/init/postinit.c   |  1 +
 src/backend/utils/misc/guc_tables.c |  1 +
 6 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index d8a9f14b3b8..b6a3f275a1b 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -126,18 +126,6 @@ extern PMChild *AllocDeadEndChild(void);
 extern bool ReleasePostmasterChildSlot(PMChild *pmchild);
 extern PMChild *FindPostmasterChildByPid(int pid);
 
-/*
- * Note: MAX_BACKENDS is limited to 2^18-1 because that's the width reserved
- * for buffer references in buf_internals.h.  This limitation could be lifted
- * by using a 64bit state; but it's unlikely to be worthwhile as 2^18-1
- * backends exceed currently realistic configurations. Even if that limitation
- * were removed, we still could not a) exceed 2^23-1 because inval.c stores
- * the ProcNumber as a 3-byte signed integer, b) INT_MAX/4 because some places
- * compute 4*MaxBackends without any overflow check.  This is rechecked in the
- * relevant GUC check hooks and in RegisterBackgroundWorker().
- */
-#define MAX_BACKENDS	0x3FFFF
-
 /*
  * These values correspond to the special must-be-first options for dispatching
  * to various subprograms.  parse_dispatch_option() can be used to convert an
diff --git a/src/include/storage/procnumber.h b/src/include/storage/procnumber.h
index 7cf981ab673..fdbecc55dbe 100644
--- a/src/include/storage/procnumber.h
+++ b/src/include/storage/procnumber.h
@@ -25,6 +25,19 @@ typedef int ProcNumber;
 
 #define INVALID_PROC_NUMBER		(-1)
 
+/*
+ * Note: MAX_BACKENDS_BITS is 18 as that is the space available for buffer
+ * refcounts in buf_internals.h.  This limitation could be lifted by using a
+ * 64bit state; but it's unlikely to be worthwhile as 2^18-1 backends exceed
+ * currently realistic configurations. Even if that limitation were removed,
+ * we still could not a) exceed 2^23-1 because inval.c stores the ProcNumber
+ * as a 3-byte signed integer, b) INT_MAX/4 because some places compute
+ * 4*MaxBackends without any overflow check.  This is rechecked in the
+ * relevant GUC check hooks and in RegisterBackgroundWorker().
+ */
+#define MAX_BACKENDS_BITS		18
+#define MAX_BACKENDS			((1 << MAX_BACKENDS_BITS)-1)
+
 /*
  * Proc number of this backend (same as GetNumberFromPGProc(MyProc))
  */
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index f1e74f184f1..d08144e9c22 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -80,9 +80,9 @@
 #include "pg_trace.h"
 #include "pgstat.h"
 #include "port/pg_bitutils.h"
-#include "postmaster/postmaster.h"
 #include "storage/proc.h"
 #include "storage/proclist.h"
+#include "storage/procnumber.h"
 #include "storage/spin.h"
 #include "utils/memutils.h"
 
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 20b28b2528c..88d798fbf4b 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -32,9 +32,9 @@
 #include "lib/qunique.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
-#include "postmaster/postmaster.h"
 #include "storage/lwlock.h"
 #include "storage/procarray.h"
+#include "storage/procnumber.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410cb..318600d6d02 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -49,6 +49,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
+#include "storage/procnumber.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/smgr.h"
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 03a6dd49154..ad25cbb39c5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -77,6 +77,7 @@
 #include "storage/large_object.h"
 #include "storage/pg_shmem.h"
 #include "storage/predicate.h"
+#include "storage/procnumber.h"
 #include "storage/standby.h"
 #include "tcop/backend_startup.h"
 #include "tcop/tcopprot.h"
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0002-Base-LWLock-limits-directly-on-MAX_BACKENDS.patchtext/x-diff; charset=us-asciiDownload
From 78253592e0ee2e5f36d62ea8789271bb7a28fea7 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 21 Feb 2025 12:59:56 -0500
Subject: [PATCH v2 2/4] Base LWLock limits directly on MAX_BACKENDS

Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a49c, but the comment in lwlock.c wasn't updated.

Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.

As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.

Reported-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
---
 src/backend/storage/lmgr/lwlock.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index d08144e9c22..7c7bf3f300c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -91,19 +91,29 @@
 #endif
 
 
-#define LW_FLAG_HAS_WAITERS			((uint32) 1 << 30)
-#define LW_FLAG_RELEASE_OK			((uint32) 1 << 29)
-#define LW_FLAG_LOCKED				((uint32) 1 << 28)
+#define LW_FLAG_HAS_WAITERS			((uint32) 1 << 31)
+#define LW_FLAG_RELEASE_OK			((uint32) 1 << 30)
+#define LW_FLAG_LOCKED				((uint32) 1 << 29)
+#define LW_FLAG_BITS				3
+#define LW_FLAG_MASK				(((1<<LW_FLAG_BITS)-1)<<(32-LW_FLAG_BITS))
 
-#define LW_VAL_EXCLUSIVE			((uint32) 1 << 24)
+/* assumes MAX_BACKENDS is a (power of 2) - 1, checked below */
+#define LW_VAL_EXCLUSIVE			(MAX_BACKENDS + 1)
 #define LW_VAL_SHARED				1
 
-#define LW_LOCK_MASK				((uint32) ((1 << 25)-1))
-/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
-#define LW_SHARED_MASK				((uint32) ((1 << 24)-1))
+/* already (power of 2)-1, i.e. suitable for a mask */
+#define LW_SHARED_MASK				MAX_BACKENDS
+#define LW_LOCK_MASK				(MAX_BACKENDS | LW_VAL_EXCLUSIVE)
 
-StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
-				 "MAX_BACKENDS too big for lwlock.c");
+
+StaticAssertDecl(((MAX_BACKENDS + 1) & MAX_BACKENDS) == 0,
+				 "MAX_BACKENDS + 1 needs to be a power of 2");
+
+StaticAssertDecl((MAX_BACKENDS & LW_FLAG_MASK) == 0,
+				 "MAX_BACKENDS and LW_FLAG_MASK overlap");
+
+StaticAssertDecl((LW_VAL_EXCLUSIVE & LW_FLAG_MASK) == 0,
+				 "LW_VAL_EXCLUSIVE and LW_FLAG_MASK overlap");
 
 /*
  * There are three sorts of LWLock "tranches":
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0003-bufmgr-Make-it-easier-to-change-number-of-buffer-.patchtext/x-diff; charset=us-asciiDownload
From 716e11a592fd7df6a7e263d9b6237ff96d98c4db Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 21 Feb 2025 13:08:11 -0500
Subject: [PATCH v2 3/4] bufmgr: Make it easier to change number of buffer
 state bits

In an upcoming commit I'd like to change the number of bits for the usage
count (the current max is 5, fitting in three bits, but we reserve four
bits). Until now that required adjusting a bunch of magic constants, now the
constants are defined based on the number of bits reserved.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
---
 src/include/storage/buf_internals.h | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 1a65342177d..d830d5c9841 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -39,12 +39,19 @@
  *
  * The definition of buffer state components is below.
  */
+#define BUF_REFCOUNT_BITS 18
+#define BUF_USAGECOUNT_BITS 4
+#define BUF_FLAG_BITS 10
+
+StaticAssertDecl(BUF_REFCOUNT_BITS + BUF_USAGECOUNT_BITS + BUF_FLAG_BITS == 32,
+				 "parts of buffer state space need to equal 32");
+
 #define BUF_REFCOUNT_ONE 1
-#define BUF_REFCOUNT_MASK ((1U << 18) - 1)
-#define BUF_USAGECOUNT_MASK 0x003C0000U
-#define BUF_USAGECOUNT_ONE (1U << 18)
-#define BUF_USAGECOUNT_SHIFT 18
-#define BUF_FLAG_MASK 0xFFC00000U
+#define BUF_REFCOUNT_MASK ((1U << BUF_REFCOUNT_BITS) - 1)
+#define BUF_USAGECOUNT_MASK (((1U << BUF_USAGECOUNT_BITS) - 1) << (BUF_REFCOUNT_BITS))
+#define BUF_USAGECOUNT_ONE (1U << BUF_REFCOUNT_BITS)
+#define BUF_USAGECOUNT_SHIFT BUF_REFCOUNT_BITS
+#define BUF_FLAG_MASK (((1U << BUF_FLAG_BITS) - 1) << (BUF_REFCOUNT_BITS + BUF_USAGECOUNT_BITS))
 
 /* Get refcount and usagecount from buffer state */
 #define BUF_STATE_GET_REFCOUNT(state) ((state) & BUF_REFCOUNT_MASK)
@@ -77,6 +84,9 @@
  */
 #define BM_MAX_USAGE_COUNT	5
 
+StaticAssertDecl(BM_MAX_USAGE_COUNT < (1 << BUF_USAGECOUNT_BITS),
+				 "BM_MAX_USAGE_COUNT doesn't fit in BUF_USAGECOUNT_BITS bits");
+
 /*
  * Buffer tag identifies which disk block the buffer contains.
  *
-- 
2.48.1.76.g4e746b1a31.dirty

v2-0004-bufmgr-Assert-that-MAX_BACKENDS-is-compatible-wit.patchtext/x-diff; charset=us-asciiDownload
From e01431cfa6559536c58aab1df386e9246618d95c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 21 Feb 2025 13:26:57 -0500
Subject: [PATCH v2 4/4] bufmgr: Assert that MAX_BACKENDS is compatible with
 BufferDesc.state

So far the dependency was documented in the comment above MAX_BACKENDS, but
not checked.

Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
---
 src/include/storage/buf_internals.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index d830d5c9841..8b32fb108b0 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -21,6 +21,7 @@
 #include "storage/bufmgr.h"
 #include "storage/condition_variable.h"
 #include "storage/lwlock.h"
+#include "storage/procnumber.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -86,6 +87,8 @@ StaticAssertDecl(BUF_REFCOUNT_BITS + BUF_USAGECOUNT_BITS + BUF_FLAG_BITS == 32,
 
 StaticAssertDecl(BM_MAX_USAGE_COUNT < (1 << BUF_USAGECOUNT_BITS),
 				 "BM_MAX_USAGE_COUNT doesn't fit in BUF_USAGECOUNT_BITS bits");
+StaticAssertDecl(MAX_BACKENDS_BITS <= BUF_REFCOUNT_BITS,
+				 "MAX_BACKENDS_BITS needs to be <= BUF_REFCOUNT_BITS");
 
 /*
  * Buffer tag identifies which disk block the buffer contains.
-- 
2.48.1.76.g4e746b1a31.dirty

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#12)
Re: MAX_BACKENDS size (comment accuracy)
On Sat, Feb 22, 2025 at 7:27 AM Andres Freund <andres@anarazel.de> wrote:
+#define MAX_BACKENDS_BITS        18
+#define MAX_BACKENDS            ((1 << MAX_BACKENDS_BITS)-1)

+1 for working forwards from the bits. But why not call it
PROC_NUMBER_BITS? After 024c5211 and 024c5211^, the ID for backends
is a ProcNumber, and that is the thing that must fit in 18 bits. I
like that choice of header too, it's small and limited to definitions
relating to the type and concept of a ProcNumber, which seems like the
right place for this.

Hmm. Why shouldn't the highest usable ProcNumber (that you might call
PROC_NUMBER_MAX, like INT_MAX et all) be (1 << PROC_NUMBER_BITS) - 1,
and wouldn't that imply that MAX_BACKENDS should actually be 1 <<
PROC_NUMBER_BITS and that any valid ProcNumber (a 0-based index)
should be *less than* MAX_BACKENDS (a count)? In other words, doesn't
the current coding arbitrarily prevent the use of one more backend,
the one that has the highest ProcNumber representable in 18 bits? If
I'm right about that I think it is perhaps related to the use of 0 as
an invalid/unset BackendId before the ProcNumber-only redesign.
INVALID_PROC_NUMBER is -1, ie it doesn't eat one of the possible
values in the 18-bit space reserved in various tight corners, since 0
is a valid ProcNumber, unless I'm missing something?

+ * currently realistic configurations. Even if that limitation were removed,
+ * we still could not a) exceed 2^23-1 because inval.c stores the ProcNumber
+ * as a 3-byte signed integer, b) INT_MAX/4 because some places compute
+ * 4*MaxBackends without any overflow check.  This is rechecked in the

Should those two constraints have their own assertions?

#14Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#13)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-02-22 18:54:08 +1300, Thomas Munro wrote:

On Sat, Feb 22, 2025 at 7:27 AM Andres Freund <andres@anarazel.de> wrote:
+#define MAX_BACKENDS_BITS        18
+#define MAX_BACKENDS            ((1 << MAX_BACKENDS_BITS)-1)

+1 for working forwards from the bits. But why not call it
PROC_NUMBER_BITS?

Partially just because it was named MAX_BACKENDS historically. But also
because it seems like it could be misleading - ProcNumber has more bits than
PROC_NUMBER_BITS would indicate.

Hmm. Why shouldn't the highest usable ProcNumber (that you might call
PROC_NUMBER_MAX, like INT_MAX et all)

FWIW, I don't think we should name it _MAX, precisely because of INT_MAX
etc. INT_MAX indicate the actual range of the type, which isn't what we're
dealing with here.

be (1 << PROC_NUMBER_BITS) - 1, and wouldn't that imply that MAX_BACKENDS
should actually be 1 << PROC_NUMBER_BITS and that any valid ProcNumber (a
0-based index) should be *less than* MAX_BACKENDS (a count)?

I don't *think* so, but I'm good at off-by-one-ing:

In other words, doesn't the current coding arbitrarily prevent the use of
one more backend, the one that has the highest ProcNumber representable in
18 bits? If I'm right about that I think it is perhaps related to the use
of 0 as an invalid/unset BackendId before the ProcNumber-only redesign.

We do count the number of lwlock share lockers and the number of buffer
refcounts within those bits. And obviously 0 lockers/refcounts has to be
valid. So I think the limit is correct?

+ * currently realistic configurations. Even if that limitation were removed,
+ * we still could not a) exceed 2^23-1 because inval.c stores the ProcNumber
+ * as a 3-byte signed integer, b) INT_MAX/4 because some places compute
+ * 4*MaxBackends without any overflow check.  This is rechecked in the

Should those two constraints have their own assertions?

Probably wouldn't hurt, even though I think it's unlikely to matter anytime
soon.

I didn't yet have enough coffe, but isn't the inval.c limit 2^24-1 rather than
2^23-1?

Greetings,

Andres Freund

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#14)
Re: MAX_BACKENDS size (comment accuracy)

On Sun, Feb 23, 2025 at 4:16 AM Andres Freund <andres@anarazel.de> wrote:

We do count the number of lwlock share lockers and the number of buffer
refcounts within those bits. And obviously 0 lockers/refcounts has to be
valid. So I think the limit is correct?

Ah, right. That makes perfect sense. The 18 bits need to be able to
hold a count, not just an index, and I confused myself about that from
the moment I thought about the name PROC_NUMBER_BITS, which I retract.

I didn't yet have enough coffe, but isn't the inval.c limit 2^24-1 rather than
2^23-1?

Yeah, it has 24 bits of space, but curiously backend_hi is signed, so
(msg->sm.backend_hi << 16) would be sign-extended, so it wouldn't actually
work if you used all 24 bits... which is obviously not a real
problem...

#16Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#15)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-02-23 10:39:36 +1300, Thomas Munro wrote:

On Sun, Feb 23, 2025 at 4:16 AM Andres Freund <andres@anarazel.de> wrote:

We do count the number of lwlock share lockers and the number of buffer
refcounts within those bits. And obviously 0 lockers/refcounts has to be
valid. So I think the limit is correct?

Ah, right. That makes perfect sense. The 18 bits need to be able to
hold a count, not just an index, and I confused myself about that from
the moment I thought about the name PROC_NUMBER_BITS, which I retract.

Cool. I now pushed them, including static asserts in inval.c and deadlock.

Thanks for the reviews and the complaint leading to these changes.

I didn't yet have enough coffe, but isn't the inval.c limit 2^24-1 rather than
2^23-1?

Yeah, it has 24 bits of space, but curiously backend_hi is signed, so
(msg->sm.backend_hi << 16) would be sign-extended, so it wouldn't actually
work if you used all 24 bits... which is obviously not a real
problem...

Heh, that's odd. I left it like that, didn't seem worth changing given that
it's so far from the real limit...

Greetings,

Andres Freund

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#16)
1 attachment(s)
Re: MAX_BACKENDS size (comment accuracy)

The recent commits for this drew my attention to the comment for
MAX_BACKENDS. Specifically, it says we check the value in
RegisterBackgroundWorker() (which appears to have been untrue since we
added max_worker_processes) and relevant GUC check hooks (which I removed
last year in commit 0b1fe14). I wrote a short patch to fix this, which I
intend to commit soon unless there is feedback.

--
nathan

Attachments:

v1-0001-Fix-comment-for-MAX_BACKENDS.patchtext/plain; charset=us-asciiDownload
From 5038a5256942999bd544879d25312338686dddf2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Mon, 24 Feb 2025 13:40:30 -0600
Subject: [PATCH v1 1/1] Fix comment for MAX_BACKENDS.

This comment mentions that we check the configured number of
backends does not exceed MAX_BACKENDS in RegisterBackgroundWorker()
and relevant GUC check hooks, neither of which is true anymore.  To
fix, adjust this comment to state that we do the check in
InitializeMaxBackends().

Oversights in commits 6bc8ef0b7f and 0b1fe1413e.
---
 src/include/storage/procnumber.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/include/storage/procnumber.h b/src/include/storage/procnumber.h
index 75c2c7a17c0..2ddaaf0c646 100644
--- a/src/include/storage/procnumber.h
+++ b/src/include/storage/procnumber.h
@@ -32,8 +32,8 @@ typedef int ProcNumber;
  * currently realistic configurations. Even if that limitation were removed,
  * we still could not a) exceed 2^23-1 because inval.c stores the ProcNumber
  * as a 3-byte signed integer, b) INT_MAX/4 because some places compute
- * 4*MaxBackends without any overflow check.  This is rechecked in the
- * relevant GUC check hooks and in RegisterBackgroundWorker().
+ * 4*MaxBackends without any overflow check.  We check that the configured
+ * number of backends does not exceed MAX_BACKENDS in InitializeMaxBackends().
  */
 #define MAX_BACKENDS_BITS		18
 #define MAX_BACKENDS			((1U << MAX_BACKENDS_BITS)-1)
-- 
2.39.5 (Apple Git-154)

#18Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#17)
Re: MAX_BACKENDS size (comment accuracy)

Hi,

On 2025-02-24 13:52:51 -0600, Nathan Bossart wrote:

The recent commits for this drew my attention to the comment for
MAX_BACKENDS. Specifically, it says we check the value in
RegisterBackgroundWorker() (which appears to have been untrue since we
added max_worker_processes) and relevant GUC check hooks (which I removed
last year in commit 0b1fe14). I wrote a short patch to fix this, which I
intend to commit soon unless there is feedback.

Makes sense.

Greetings,

Andres Freund

#19Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#18)
Re: MAX_BACKENDS size (comment accuracy)

On Mon, Feb 24, 2025 at 03:38:24PM -0500, Andres Freund wrote:

Makes sense.

Committed, thanks.

--
nathan