Proposal: Add a callback data parameter to GetNamedDSMSegment
Hello hackers,
While developing an extension and trying to write some generic code
around DSM areas, I noticed a limitation: although GetNamedDSMSegment
accepts a callback function, there is no way to pass additional
arguments to that callback.
For example, the documentation for creating LWLocks after startup [1]https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-LWLOCKS-AFTER-STARTUP
suggests creating locks in this callback. That works fine as long as
the callback only needs to create a hardcoded lock. But if the lock
name is a parameter to the function calling GetNamedDSMSegment, and
not fixed, I do not see a clean way to pass it through to the callback
(short of relying on global variables, for example).
As a proper solution for this, and possibly for other similar issues,
I propose adding a generic callback argument to GetNamedDSMSegment
that can be forwarded to the callback function.
What do you think about this?
[1]: https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-LWLOCKS-AFTER-STARTUP
Attachments:
0001-Add-a-callback-data-parameter-to-GetNamedDSMSegment.patchapplication/octet-stream; name=0001-Add-a-callback-data-parameter-to-GetNamedDSMSegment.patchDownload+25-17
Hi,
Can you provide more details on the use-case?
For example, the documentation for creating LWLocks after startup [1]
suggests creating locks in this callback. That works fine as long as
the callback only needs to create a hardcoded lock.
The callback is called on the first invocation of GetNamedDSMSegment for
a particular segment name. Subsequent calls just attach an existing segment.
But if the lock name is a parameter to the function calling GetNamedDSMSegment, and
not fixed, I do not see a clean way to pass it through to the callback
Keep in mind that the tranche name shows up in wait events, so you
will end up with different wait event names.
Also,
commit 38b602b capped the number of lwlock tranches to 256, so you
may hit this limit if you are creating many lwlocks.
#define MAX_NAMED_TRANCHES 256
--
Sami Imseih
Amazon Web services (AWS)
On Wed, Dec 03, 2025 at 12:47:46PM -0600, Sami Imseih wrote:
Can you provide more details on the use-case?
I think the main use-case is creating multiple DSM segments in the registry
that use the same initialization callback. I ran into this when I was
working on GetNamedDSA() and GetNamedDSHash(). In early versions of the
patch, the new functions used GetNamedDSMSegment() to allocate the space
for all the DSA/dshash information [0]/messages/by-id/attachment/177621/v8-0001-simplify-creating-hash-table-in-dsm-registry.patch. Since initializing those segments
required the user-provided name string, I ended up taking a lock after
calling GetNamedDSMSegment() and doing most of the initialization there.
My gut feeling is that this is an obscure enough use-case that this
workaround is probably sufficient, but I am interested to hear more...
[0]: /messages/by-id/attachment/177621/v8-0001-simplify-creating-hash-table-in-dsm-registry.patch
--
nathan
My gut feeling is that this is an obscure enough use-case that this
workaround is probably sufficient, but I am interested to hear more...
There are probably other good reasons to allow a generic argument
to the init callback. Besides the lwlock name, I can see someone
wanting to pass some other initialization info that may vary
depending on extension GUCs, etc.
--
Sami
On Wed, Dec 03, 2025 at 02:27:29PM -0600, Sami Imseih wrote:
There are probably other good reasons to allow a generic argument
to the init callback. Besides the lwlock name, I can see someone
wanting to pass some other initialization info that may vary
depending on extension GUCs, etc.
The value of a GUC could be obtained in the callback via the global
variable or GetConfigOption(), right?
--
nathan
On Wed, Dec 03, 2025 at 02:27:29PM -0600, Sami Imseih wrote:
There are probably other good reasons to allow a generic argument
to the init callback. Besides the lwlock name, I can see someone
wanting to pass some other initialization info that may vary
depending on extension GUCs, etc.The value of a GUC could be obtained in the callback via the global
variable or GetConfigOption(), right?
Yes, that's true. It will be hard to find other good use-cases that
can't be solved with a global variable, but we can also say this is
a low-cost change, so why not just do it.
--
Sami
On Wed, Dec 03, 2025 at 02:59:16PM -0600, Sami Imseih wrote:
Yes, that's true. It will be hard to find other good use-cases that
can't be solved with a global variable, but we can also say this is
a low-cost change, so why not just do it.
Well, for one, it requires all existing extensions that use
GetNamedDSMSegment() to be updated. That might not be too terrible, but in
any case, I think we need a stronger reason than the simplicity of the
implementation to do something.
--
nathan
I ran into this when I was
working on GetNamedDSA() and GetNamedDSHash()
Thanks for mentioning these, I didn't notice them when I rebased on
the master branch. One of my use cases was this, I implemented
something similar to GetNamedDSHash - it's a generic wrapper for
dshash in C++, and I ran into the same issue.
Besides the lwlock name, I can see someone
wanting to pass some other initialization info that may vary
depending on extension GUCs, etc.
Yes, that was my thought too. GUCs/globals can be accessed directly,
but there's still the reusing the same function part. and also
calculated local variables.
My other use case is using GetNamedDSMSegment without DSHash, with a
flexible array, similar to:
struct Foo {
LWLock lock;
size_t size;
Bar data[];
};
* To create a few of these, I have to provide a lock name to the
callback, that's the "reusing the same callback" part again
* And then there's the question of initialization. Either I leave it
to the caller after returning from GetNamedDSHash using the lock, or
somehow I have to tell the initialization callback the array size -
even if I can calculate the size based on a GUC, I wouldn't want to
trust that instead of the actual information from the caller.
Well, for one, it requires all existing extensions that use
GetNamedDSMSegment() to be updated. That might not be too terrible, but in
any case, I think we need a stronger reason than the simplicity of the
implementation to do something.
I did some searching for usage before my initial email, and I only
found 3 extensions that use this method, which seemed like a small
number. Also, the LWLockInitialize change already requires ifdefs in
this function for PG19.
But maybe the second use case alone is too specific to justify this
change, and there are workarounds to it.
struct Foo {
LWLock lock;
size_t size;
Bar data[];
};* To create a few of these, I have to provide a lock name to the
callback, that's the "reusing the same callback" part again
* And then there's the question of initialization. Either I leave it
to the caller after returning from GetNamedDSHash using the lock,
"caller after returning from GetNamedDSHash" <- do you mean
GetNamedDSMSegment ?
or somehow I have to tell the initialization callback the array size -
even if I can calculate the size based on a GUC,
```
typedef struct Bar {
int f1;
int f2;
} Bar;
typedef struct Foo {
LWLock lock;
size_t size;
Bar data[];
} Foo;
foo_state = GetNamedDSMSegment("Foo",
offsetof(Foo, data) + BAR_ARRAY_SIZE * sizeof(int),
foo_init_state,
&found);
```
wouldn't the above be sufficient to create a DSM segment containing
a flexible array?
--
Sami Imseih
Amazon Web Services (AWS)
"caller after returning from GetNamedDSHash" <- do you mean
GetNamedDSMSegment ?
Yes, that was a typo.
wouldn't the above be sufficient to create a DSM segment containing
a flexible array?
Yes, it creates it, but can I initialize it properly in
foo_init_state? How can I set the size member to the proper array
size, and how can I zero-initialize the array with the correct length
in it? What I can do currently is:
1. create the lwlock and set size to 0 in foo_init_state
2. take the lwlock after GetNamedDSMSegment returns
3. if size is 0 set it properly and zero-initialize the array
That's why I said that there is a workaround, but it would be nicer if
I could do it properly in the init callback, by passing the array size
as a parameter to it.
On Thu, Dec 04, 2025 at 05:40:21PM +0000, Zsolt Parragi wrote:
wouldn't the above be sufficient to create a DSM segment containing
a flexible array?Yes, it creates it, but can I initialize it properly in
foo_init_state? How can I set the size member to the proper array
size, and how can I zero-initialize the array with the correct length
in it? What I can do currently is:1. create the lwlock and set size to 0 in foo_init_state
2. take the lwlock after GetNamedDSMSegment returns
3. if size is 0 set it properly and zero-initialize the arrayThat's why I said that there is a workaround, but it would be nicer if
I could do it properly in the init callback, by passing the array size
as a parameter to it.
So, it seems like we've established at least 2 potential use-cases for this
argument (tranche name and flexible arrays). And the amount of extension
breakage doesn't seem too bad either. IMHO it'd be reasonable to proceed
with this change.
--
nathan
wouldn't the above be sufficient to create a DSM segment containing
a flexible array?Yes, it creates it, but can I initialize it properly in
foo_init_state? How can I set the size member to the proper array
size, and how can I zero-initialize the array with the correct length
in it? What I can do currently is:1. create the lwlock and set size to 0 in foo_init_state
2. take the lwlock after GetNamedDSMSegment returns
3. if size is 0 set it properly and zero-initialize the arrayThat's why I said that there is a workaround, but it would be nicer if
I could do it properly in the init callback, by passing the array size
as a parameter to it.So, it seems like we've established at least 2 potential use-cases for this
argument (tranche name and flexible arrays). And the amount of extension
breakage doesn't seem too bad either. IMHO it'd be reasonable to proceed
with this change.
I agree.
The workaround used above looks unsafe, because a backend could attach
to a partially initialized segment. Providing more flexibility in the
init callback
is a clear improvement.
The patch itself is straightforward. One thing that stands out is this:
```
static void
-init_tranche(void *ptr)
+init_tranche(void *ptr, void *arg)
{
int *tranche_id = (int *) ptr;
- *tranche_id = LWLockNewTrancheId("test_dsa");
+ *tranche_id = LWLockNewTrancheId(arg);
}
/* Test basic DSA functionality */
@@ -39,7 +39,7 @@ test_dsa_basic(PG_FUNCTION_ARGS)
dsa_pointer p[100];
tranche_id = GetNamedDSMSegment("test_dsa", sizeof(int),
-
init_tranche, &found);
+
init_tranche, "test_dsa", &found);
```
In almost all cases, the segment name is also used as the tranche name.
Currently, we set this name twice; in the GetNamedDSMSegment and inside
the init callback. The proposed patch does not improve this either.
I suggest passing the segment_name explicitly to the callback, and
reserving the extra argument for more complex data. If we are
changing the API, this seems like the right time to do so. So,
I think we should do something like:
```
static void
init_tranche(void *ptr, const char *segment_name, void *arg)
{
int *tranche_id = (int *) ptr;
*tranche_id = LWLockNewTrancheId(segment_name);
}
```
This works well for the first use-case identified. Instead of hard-
coding the tranche name in the callback, the name can be
retrieved as the segment name set in GetNamedDSMSegment.
The caller could still pass this name via the extra callback args, but
it's better to separate things a bit here, and reserve the extra callback
arguments for more complex data.
What do you think?
--
Sami Imseih
Amazon Web Services (AWS)
This works well for the first use-case identified. Instead of hard-
coding the tranche name in the callback, the name can be
retrieved as the segment name set in GetNamedDSMSegment.The caller could still pass this name via the extra callback args, but
it's better to separate things a bit here, and reserve the extra callback
arguments for more complex data.What do you think?
As I did not hear back, I went ahead and prepared a patch with the above.
I went back-forth on if it makes sense to provide the name as an
extra argument and decided it provides more flexibility. For example
I can use the same init callback and arguments for different segments.
Also, the name provides a guarantee of the name of the segment that
this callback is initializing.
Overall, I felt it was a better approach.
I updated the code comments and documentation.
Also, in test_dsa.c, i updated the name of the segments to
reflect the name of the functions creating the segments, like
below:
```
@@ -38,8 +38,8 @@ test_dsa_basic(PG_FUNCTION_ARGS)
dsa_area *a;
dsa_pointer p[100];
- tranche_id = GetNamedDSMSegment("test_dsa", sizeof(int),
-
init_tranche, &found);
+ tranche_id = GetNamedDSMSegment("test_dsa_basic", sizeof(int),
+
init_tranche, NULL, &found);
@@ -79,8 +79,8 @@ test_dsa_resowners(PG_FUNCTION_ARGS)
ResourceOwner oldowner;
ResourceOwner childowner;
- tranche_id = GetNamedDSMSegment("test_dsa", sizeof(int),
-
init_tranche, &found);
+ tranche_id = GetNamedDSMSegment("test_dsa_resowners", sizeof(int),
+
init_tranche, NULL, &found);
```
This is good for showing the same init callback being re-used
for different named segment initizations.
I also updated the commit message.
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v2-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment.patchapplication/octet-stream; name=v2-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment.patchDownload+32-22
Thanks for the new patch.
On Thu, Dec 11, 2025 at 04:12:02PM -0600, Sami Imseih wrote:
I went back-forth on if it makes sense to provide the name as an
extra argument and decided it provides more flexibility. For example
I can use the same init callback and arguments for different segments.
If the initialization callback function needed the name, it could be
provided via the "void *" callback argument, right? I'm not following why
we need to provide it separately.
--
nathan
As I did not hear back, I went ahead and prepared a patch with the above.
If the question was for me sorry for not replying, I assumed it was
meant for Nathan.
Personally I'm not sure if we need this as even requiring a name isn't
a common use case, but I'm also fine with this version.
The only additional thing I would do is to add some kind of test that
verifies that we indeed forward the pointer to the callback in
test_dsa, for example:
+const char* forwardedData = "ForwardedData";
+
static void
-init_tranche(void *ptr)
+init_tranche(void *ptr, const char *name, void *arg)
{
int *tranche_id = (int *) ptr;
- *tranche_id = LWLockNewTrancheId("test_dsa");
+ if (arg != forwardedData) {
+ elog(ERROR, "Didn't receive expected arg pointer");
+ }
+
+ *tranche_id = LWLockNewTrancheId(name);
}
As otherwise we no longer test the value of the pointer in any of the
tests with the additional name parameter.
On a slightly related topic, I mentioned earlier that I found 3
extensions using GetNamedDSMSegment. One of them,
pg_track_optimizer[1]https://github.com/danolivo/pg_track_optimizer/ could use the new GetNamedDSHash function,
except that it doesn't provide a callback similar to what we have in
GetNamedDSMSegment, and it relies on initializing the hash in the
callback, before it returns. That's not possible with the new
function.
What do you think, would it make sense to also include callbacks for
GetNamedDSHash and GetNamedDSA?
[1]: https://github.com/danolivo/pg_track_optimizer/
Show quoted text
On Thu, Dec 11, 2025 at 10:12 PM Sami Imseih <samimseih@gmail.com> wrote:
This works well for the first use-case identified. Instead of hard-
coding the tranche name in the callback, the name can be
retrieved as the segment name set in GetNamedDSMSegment.The caller could still pass this name via the extra callback args, but
it's better to separate things a bit here, and reserve the extra callback
arguments for more complex data.What do you think?
As I did not hear back, I went ahead and prepared a patch with the above.
I went back-forth on if it makes sense to provide the name as an
extra argument and decided it provides more flexibility. For example
I can use the same init callback and arguments for different segments.Also, the name provides a guarantee of the name of the segment that
this callback is initializing.Overall, I felt it was a better approach.
I updated the code comments and documentation.
Also, in test_dsa.c, i updated the name of the segments to
reflect the name of the functions creating the segments, like
below:
```
@@ -38,8 +38,8 @@ test_dsa_basic(PG_FUNCTION_ARGS)
dsa_area *a;
dsa_pointer p[100];- tranche_id = GetNamedDSMSegment("test_dsa", sizeof(int), - init_tranche, &found); + tranche_id = GetNamedDSMSegment("test_dsa_basic", sizeof(int), + init_tranche, NULL, &found);@@ -79,8 +79,8 @@ test_dsa_resowners(PG_FUNCTION_ARGS)
ResourceOwner oldowner;
ResourceOwner childowner;- tranche_id = GetNamedDSMSegment("test_dsa", sizeof(int), - init_tranche, &found); + tranche_id = GetNamedDSMSegment("test_dsa_resowners", sizeof(int), + init_tranche, NULL, &found);```
This is good for showing the same init callback being re-used
for different named segment initizations.I also updated the commit message.
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v3-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment-in.patchapplication/octet-stream; name=v3-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment-in.patchDownload+38-22
I went back-forth on if it makes sense to provide the name as an
extra argument and decided it provides more flexibility. For example
I can use the same init callback and arguments for different segments.If the initialization callback function needed the name, it could be
provided via the "void *" callback argument, right? I'm not following why
we need to provide it separately.
While it's true it can be passed as extra data, it is less error-prone
as we guarantee the real name of the segment is made available to
the callback. Also a caller to GetNamedDSMSegment does not need to
pass the name twice, as the name and as extra data. The most common
case I would think is using the segment name as the tranche name when
initializing a lwlock.
--
Sami Imseih
Amazon Web Services (AWS)
On a slightly related topic, I mentioned earlier that I found 3
extensions using GetNamedDSMSegment. One of them,
pg_track_optimizer[1] could use the new GetNamedDSHash function,
except that it doesn't provide a callback similar to what we have in
c, and it relies on initializing the hash in the
callback, before it returns. That's not possible with the new
function.What do you think, would it make sense to also include callbacks for
GetNamedDSHash and GetNamedDSA?
GetNamedDSA and GetNamedDSHash do not have a need for a callback,
because there isn't custom initialization logic that can be applied there.
--
Sami Imseih
Amazon Web Services (AWS)
+const char* forwardedData = "ForwardedData"; + static void -init_tranche(void *ptr) +init_tranche(void *ptr, const char *name, void *arg) { int *tranche_id = (int *) ptr;- *tranche_id = LWLockNewTrancheId("test_dsa"); + if (arg != forwardedData) { + elog(ERROR, "Didn't receive expected arg pointer"); + } + + *tranche_id = LWLockNewTrancheId(name); }As otherwise we no longer test the value of the pointer in any of the
tests with the additional name parameter.
Good call adding the tests. I do get compilation errors though with
v3, due to passing a const to a void pointer. I think this test could be
made better by checking that the arg data matches some data that
we expect.
```
-init_tranche(void *ptr)
+init_tranche(void *ptr, const char *name, void *arg)
{
int *tranche_id = (int *) ptr;
- *tranche_id = LWLockNewTrancheId("test_dsa");
+ /* Verify arg if given */
+ if (arg && strcmp(name, (char *) arg) != 0)
+ elog(ERROR, "didn't receive expected arg data");
+
+ *tranche_id = LWLockNewTrancheId(name);
}
```
--
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v4-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment.patchapplication/octet-stream; name=v4-0001-Pass-name-and-callback-data-to-GetNamedDSMSegment.patchDownload+36-22
GetNamedDSA and GetNamedDSHash do not have a need for a callback,
because there isn't custom initialization logic that can be applied there.
The use case for GetNamedDSHash is:
1. I want to create a dshash
2. I want to make sure that I can preload some initial data into it
before any backend is able to access it
My trivial example for it would be persistent statistics: when I want
to collect some information, save it to disk before shutdown, and on
the next startup, I want to load the previous state before continuing
collecting. pg_track_optimizer seems to do this. There are also
definitely other reasons.
* If I do it with GetNamedDSMSegment, it is doable inside the init
function: the init function is called when GetNamedDSMSegment still
holds a lock
* If I do it with traditional shared memory manually it is again
doable, as I control how I call dshash_create. If the code doesn't
already hold a lock, I take a lock when I call it
* But how can I do it with GetNamedDSHash? Should I take a different
lock manually every time before I call GetNamedDSHash? That means I
also have to store that different lock somewhere, I have to call
GetNamedDSMSegment to get storage for that, and at that point it is
easier to call dshash_create there in the initialization function, and
skip GetNamedDSHash entirely.
The answer might be this, that GetNamedDSHash is only for the simple
use cases, and for anything requiring more control, we have to fall
back to using GetNamedDSMSegment. I was just wondering if it would be
useful to add this functionality to GetNamedDSHash.
I think this test could be
made better by checking that the arg data matches some data that
we expect.
I verified the pointer address because it shouldn't change inside
GetNamedDSHash, but as long as the data is the same it doesn't really
matter, this version also looks good.
On Thu, Dec 11, 2025 at 05:17:30PM -0600, Sami Imseih wrote:
If the initialization callback function needed the name, it could be
provided via the "void *" callback argument, right? I'm not following why
we need to provide it separately.While it's true it can be passed as extra data, it is less error-prone
as we guarantee the real name of the segment is made available to
the callback. Also a caller to GetNamedDSMSegment does not need to
pass the name twice, as the name and as extra data. The most common
case I would think is using the segment name as the tranche name when
initializing a lwlock.
But... they can just pass that in the "void *" argument. I'm pretty firmly
-1 for adding more than the one callback argument here.
--
nathan