dshash_find_or_insert vs. OOM

Started by Robert Haas2 months ago16 messageshackers

robertmhaas@gmail.com

2 months ago

Hi,

For most memory allocation primitives, we offer a "no OOM" version.
dshash_find_or_insert is an exception. Here's a small patch to fix
that. I have some interest in slipping this into v19 even though I am
submitting it quite late, because it would be useful for
pg_stash_advice[1]As yet uncommitted. See pg_plan_advice thread.. Let me know what you think about that.

--
Robert Haas
EDB: http://www.enterprisedb.com

[1]: As yet uncommitted. See pg_plan_advice thread.

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: Robert Haas (#1)

Re: dshash_find_or_insert vs. OOM

On Mar 18, 2026, at 07:34, Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

For most memory allocation primitives, we offer a "no OOM" version.
dshash_find_or_insert is an exception. Here's a small patch to fix
that. I have some interest in slipping this into v19 even though I am
submitting it quite late, because it would be useful for
pg_stash_advice[1]. Let me know what you think about that.

--
Robert Haas
EDB: http://www.enterprisedb.com

[1] As yet uncommitted. See pg_plan_advice thread.
<v1-0001-dshash-Make-it-possible-to-suppress-out-of-memory.patch>

I think adding a DSHASH_INSERT_NO_OOM flag is the right choice. As you mentioned, we already have other _NO_OOM flags, so this feels consistent with existing patterns.

I don’t see any correctness issue with the patch. I just have a couple of minor nits.

1
```
@@ -477,14 +485,22 @@ restart:
 			 * reacquire all the locks in the right order to avoid deadlocks.
 			 */
 			LWLockRelease(PARTITION_LOCK(hash_table, partition_index));
-			resize(hash_table, hash_table->size_log2 + 1);
+			if (!resize(hash_table, hash_table->size_log2 + 1, flags))
+				return NULL;

goto restart;
}

 		/* Finally we can try to insert the new item. */
 		item = insert_into_bucket(hash_table, key,
-								  &BUCKET_FOR_HASH(hash_table, hash));
+								  &BUCKET_FOR_HASH(hash_table, hash),
+								  flags);
+		if (item == NULL)
+		{
+			Assert((flags & DSHASH_INSERT_NO_OOM) != 0);
+			LWLockRelease(PARTITION_LOCK(hash_table, partition_index));
+			return NULL;
+		}
```

When OOM happens, Assert((flags & DSHASH_INSERT_NO_OOM) != 0); makes sense. But for resize(), the assert is inside resize(), while for insert_into_bucket(), the assert is in the caller. That feels a bit inconsistent to me, and I think it hurts readability a little. A reader might wonder why there is no corresponding assert after resize() unless they go read the function body.

I think either style is fine, but using the same style in both places would be better.

2
```
 	/* Allocate the space for the new table. */
-	new_buckets_shared =
-		dsa_allocate_extended(hash_table->area,
-							  sizeof(dsa_pointer) * new_size,
-							  DSA_ALLOC_HUGE | DSA_ALLOC_ZERO);
-	new_buckets = dsa_get_address(hash_table->area, new_buckets_shared);
+	{
+		int			dsa_flags = DSA_ALLOC_HUGE | DSA_ALLOC_ZERO;
+
+		if (flags & DSHASH_INSERT_NO_OOM)
+			dsa_flags |= DSA_ALLOC_NO_OOM;
+		new_buckets_shared =
+			dsa_allocate_extended(hash_table->area,
+								  sizeof(dsa_pointer) * new_size,
+								  dsa_flags);
+	}
```

Making this a nested block does have the benefit of keeping dsa_flags close to where it is used. But from my impression, this style is still fairly uncommon in the codebase. I worry it may implicitly signal to other hackers that this is an acceptable pattern. So unless we intentionally want to encourage that style, I would lean toward avoiding it here.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Robert Haas (#1)

Re: dshash_find_or_insert vs. OOM

Hi,

For most memory allocation primitives, we offer a "no OOM" version.
dshash_find_or_insert is an exception. Here's a small patch to fix
that. I have some interest in slipping this into v19 even though I am
submitting it quite late, because it would be useful for
pg_stash_advice[1]. Let me know what you think about that.

Thanks for the patch!

I agree with this idea, as it could act as a good triggering point for
a caller to perform an eviction of the dshash if they run out of space,
and avoid a hard failure.

Passing this as a flag seems OK with me. Not sure what other
flags could be added in the future, but the flexibility is a good
thing, IMO. I was debating if this should just be a dsh_params
option, but maybe for the same dshash a caller may want OOM
in some code path, and retry in some other. maybe?

+                                                                 &BUCKET_FOR_HASH(hash_table, hash),
+                                                                 flags);
+               if (item == NULL)
+               {
+                       Assert((flags & DSHASH_INSERT_NO_OOM) != 0);
+                       LWLockRelease(PARTITION_LOCK(hash_table, partition_index));
+                       return NULL;
+               }
```
When OOM happens, Assert((flags & DSHASH_INSERT_NO_OOM) != 0); makes sense. But for resize(), the assert is inside resize(), while for insert_into_bucket(), the assert is in the caller.
That feels a bit inconsistent to me, and I think it hurts readability a little. A reader might wonder why there is no corresponding assert after resize() unless they go read the function body.

I think either style is fine, but using the same style in both places would be better.

I agree with this. The assert should be

if (!resize(hash_table, hash_table->size_log2 + 1, flags))
{
Assert((flags & DSHASH_INSERT_NO_OOM) != 0);
return NULL;
}

instead of inside resize().

I did some testing by triggering an OOM, a no-OOM, and an OOM with
eviction afterwards, as a sanity check. All looks good.
I've attached the tests I created with other basic testing, as dshash
lacks direct testing in general, if you're inclined to add them.

--
Sami Imseih
Amazon Web Services (AWS)

Robert Haas

robertmhaas@gmail.com

2 months ago

In reply to: Chao Li (#2)

Re: dshash_find_or_insert vs. OOM

On Tue, Mar 17, 2026 at 9:34 PM Chao Li <li.evan.chao@gmail.com> wrote:

When OOM happens, Assert((flags & DSHASH_INSERT_NO_OOM) != 0); makes sense. But for resize(), the assert is inside resize(), while for insert_into_bucket(), the assert is in the caller. That feels a bit inconsistent to me, and I think it hurts readability a little. A reader might wonder why there is no corresponding assert after resize() unless they go read the function body.

Adjusted.

Making this a nested block does have the benefit of keeping dsa_flags close to where it is used. But from my impression, this style is still fairly uncommon in the codebase. I worry it may implicitly signal to other hackers that this is an acceptable pattern. So unless we intentionally want to encourage that style, I would lean toward avoiding it here.

Yeah, that was dumb. Fixed.

Thanks for the review; here's v2.

--
Robert Haas
EDB: http://www.enterprisedb.com

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Robert Haas (#4)

Re: dshash_find_or_insert vs. OOM

Thanks for the review; here's v2.

LGTM

--
Sami Imseih
Amazon Web Services (AWS)

Robert Haas

robertmhaas@gmail.com

2 months ago

In reply to: Sami Imseih (#3)

Re: dshash_find_or_insert vs. OOM

On Wed, Mar 18, 2026 at 11:21 AM Sami Imseih <samimseih@gmail.com> wrote:

Passing this as a flag seems OK with me. Not sure what other
flags could be added in the future, but the flexibility is a good
thing, IMO. I was debating if this should just be a dsh_params
option, but maybe for the same dshash a caller may want OOM
in some code path, and retry in some other. maybe?

Yeah, I think this is a property of the individual call to
dshash_find_or_insert(), rather than a property of the table. It does
seem likely that most callers will want the same behavior in all
cases, but it would be very sad if we embedded that idea into the
structure of the code and then somebody has a case where it's not what
they want, and I see no real reason why that couldn't happen. Also,
doing it that way wouldn't really follow existing precedents
(pg_malloc_extended, palloc_extended, MemoryContextAllocExtended,
dsa_allocate_extended).

I did some testing by triggering an OOM, a no-OOM, and an OOM with
eviction afterwards, as a sanity check. All looks good.
I've attached the tests I created with other basic testing, as dshash
lacks direct testing in general, if you're inclined to add them.

I tried running a coverage report for this. It's pretty good. It
doesn't cover these things:

- dshash_create: OOM while allocating the bucket array.
- dshash_find_or_insert_extended: OOM while inserting an item
(apparently, at least here, it instead hits OOM while resizing)
- dshash_delete_key: the case where the entry not found
- dshash_dump: not called at all
- resize: the case where we exit early due to a concurrent resize
- delete_item_from_bucket: the case where there is more than one item
in the bucket

I think that adding a call to dshash_dump() somewhere would probably
make sense, and I'd suggest also trying to delete a nonexistent key.
The rest don't seem worth worrying about.

On the code itself:

+    /* Verify all entries via find. */
+    for (int i = 0; i < count; i++)
+    {
+        entry = dshash_find(ht, &i, false);
+        if (entry == NULL)
+            elog(ERROR, "key %d not found", i);
+        dshash_release_lock(ht, entry);
+    }

You could verify that entry->key == i. The current code wouldn't
notice if the hash table returned a garbage pointer. You could
possibly also include some kind of a payload in each record and verify
that, e.g. set entry->value =
some_homomorphism_over_0_to_9(entry->key).

+ dsa_set_size_limit(area, dsa_minimum_size());

This is an abuse of the documented purpose of dsa_set_size_limit().
Seems better to just pick a constant here.

+    /* Insert until OOM — without NO_OOM flag, this should raise ERROR. */
+    for (;;)

I think it would be safer to code this as a fixed iteration count. For
example, if you choose the size limit so that we should run out of
memory after 10 entries, you could terminate this loop after 1000
iterations. That way, if something goes wrong, we're more likely to
see "expected out-of-memory, but completed all %d iterations" in the
log rather than a core dump or whatever.

I+    {
+        TestDshashEntry *entry;
+
+        entry = dshash_find_or_insert(ht, &key, &found);
+        dshash_release_lock(ht, entry);
+        key++;
+    }

In other places, you check for entry == NULL, but not here.

+    {
+        TestDshashEntry *entry;
+
+        entry = dshash_find_or_insert_extended(ht, &key, &found,
+                                               DSHASH_INSERT_NO_OOM);
+        if (entry == NULL)
+            elog(ERROR, "insert after eviction failed");
+        dshash_release_lock(ht, entry);
+    }

I just got in trouble for letting a bare block sneak into my code, so
now it's my turn to complain.

I suggest git config user.email in whatever directory you use to
generate patches like this, just to make life a bit easier for anyone
who might eventually be committing one of them.

--
Robert Haas
EDB: http://www.enterprisedb.com

Chao Li

li.evan.chao@gmail.com

2 months ago

In reply to: Robert Haas (#4)

Re: dshash_find_or_insert vs. OOM

On Mar 19, 2026, at 01:46, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 17, 2026 at 9:34 PM Chao Li <li.evan.chao@gmail.com> wrote:

When OOM happens, Assert((flags & DSHASH_INSERT_NO_OOM) != 0); makes sense. But for resize(), the assert is inside resize(), while for insert_into_bucket(), the assert is in the caller. That feels a bit inconsistent to me, and I think it hurts readability a little. A reader might wonder why there is no corresponding assert after resize() unless they go read the function body.

Adjusted.

Making this a nested block does have the benefit of keeping dsa_flags close to where it is used. But from my impression, this style is still fairly uncommon in the codebase. I worry it may implicitly signal to other hackers that this is an acceptable pattern. So unless we intentionally want to encourage that style, I would lean toward avoiding it here.

Yeah, that was dumb. Fixed.

Thanks for the review; here's v2.

--
Robert Haas
EDB: http://www.enterprisedb.com
<v2-0001-dshash-Make-it-possible-to-suppress-out-of-memory.patch>

Thanks for updating the patch. V2 LGTM.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Robert Haas

robertmhaas@gmail.com

2 months ago

In reply to: Chao Li (#7)

Re: dshash_find_or_insert vs. OOM

On Wed, Mar 18, 2026 at 8:09 PM Chao Li <li.evan.chao@gmail.com> wrote:

Thanks for updating the patch. V2 LGTM.

Committed, thanks to both of you.

--
Robert Haas
EDB: http://www.enterprisedb.com

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Robert Haas (#6)

Re: dshash_find_or_insert vs. OOM

Thanks for the review!

I did some testing by triggering an OOM, a no-OOM, and an OOM with
eviction afterwards, as a sanity check. All looks good.
I've attached the tests I created with other basic testing, as dshash
lacks direct testing in general, if you're inclined to add them.

I tried running a coverage report for this. It's pretty good. It
doesn't cover these things:

- dshash_create: OOM while allocating the bucket array.
- dshash_find_or_insert_extended: OOM while inserting an item
(apparently, at least here, it instead hits OOM while resizing)
- dshash_delete_key: the case where the entry not found
- dshash_dump: not called at all
- resize: the case where we exit early due to a concurrent resize
- delete_item_from_bucket: the case where there is more than one item
in the bucket

These are good to add. I also included delete_entry and dshash_dump
for more coverage. delete_item_from_bucket will require us to hash
the item to the same bucket, and I'm not sure it's worth the hassle.
resize() will occur in the OOM test.

I think that adding a call to dshash_dump() somewhere would probably

Done

make sense, and I'd suggest also trying to delete a nonexistent key.

Done

On the code itself:
+    /* Verify all entries via find. */
+    for (int i = 0; i < count; i++)
+    {
+        entry = dshash_find(ht, &i, false);
+        if (entry == NULL)
+            elog(ERROR, "key %d not found", i);
+        dshash_release_lock(ht, entry);
+    }
You could verify that entry->key == i. The current code wouldn't
notice if the hash table returned a garbage pointer. You could
possibly also include some kind of a payload in each record and verify
that, e.g. set entry->value =
some_homomorphism_over_0_to_9(entry->key).

I added key verification as you suggested.

+ dsa_set_size_limit(area, dsa_minimum_size());

This is an abuse of the documented purpose of dsa_set_size_limit().
Seems better to just pick a constant here.

I changed this to use a different limit.

+    /* Insert until OOM — without NO_OOM flag, this should raise ERROR. */
+    for (;;)
I think it would be safer to code this as a fixed iteration count. For
example, if you choose the size limit so that we should run out of
memory after 10 entries, you could terminate this loop after 1000
iterations. That way, if something goes wrong, we're more likely to
see "expected out-of-memory, but completed all %d iterations" in the
log rather than a core dump or whatever.

Done. I also removed the OOM retry test and just kept a simple
test. Adding entries after a delete is now happening in the basic test.

I+    {
+        TestDshashEntry *entry;
+
+        entry = dshash_find_or_insert(ht, &key, &found);
+        dshash_release_lock(ht, entry);
+        key++;
+    }

In other places, you check for entry == NULL, but not here.

Fixed.

I just got in trouble for letting a bare block sneak into my code, so
now it's my turn to complain.

ugggh. fixed.

I suggest git config user.email in whatever directory you use to
generate patches like this, just to make life a bit easier for anyone
who might eventually be committing one of them.

Sorry. I usually do, but I started using a new machine :(

--
Sami Imseih
Amazon Web Services (AWS)

#10

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Sami Imseih (#9)

Re: dshash_find_or_insert vs. OOM

I just realized v2 removed the test for dshash_find_or_insert_extended
with the NO_OOM flag. Corrected in v3.

--
Sami Imseih
Amazon Web Services (AWS)

#11

Robert Haas

robertmhaas@gmail.com

2 months ago

In reply to: Sami Imseih (#10)

Re: dshash_find_or_insert vs. OOM

On Thu, Mar 19, 2026 at 8:26 PM Sami Imseih <samimseih@gmail.com> wrote:

I just realized v2 removed the test for dshash_find_or_insert_extended
with the NO_OOM flag. Corrected in v3.

You need to get rid of the bare blocks. If you happen to be using an
AI to help write your code, I definitely suggest telling it to
remember not to ever do that.

This verifies that dshash_dump() doesn't crash, but not that it
produces useful output. Whether that's worth the infrastructure, I
don't know.

But more generally, I wonder what people think about how much value
this kind of thing has. With just the core regression test suite, we
get coverage of 76.5% of the lines in dshash.c. Adding the isolation
test suite and the test_dsm_registry test suite brings coverage up to
81.9% (334 of 408). The things that are not covered are: OOM cases,
dshash_destroy(), dshash_dump(), concurrency case in resizing.
delete_key_from_bucket() loop iterating more than once,
delete_item_from_bucket() returning false. With the v3 patch, running
just the test_dshash suite, coverage rises to 93.4% of lines (381 of
408). The omissions are: some OOM cases, dshash_delete_bucket()
successfully deletes something, concurrency case in resizing, most of
delete_key_from_bucket(), delete_item_from_bucket() iterating more
than once or returning false. Combining your patch with the test suite
previously mentioned, we get up to 97.1% coverage by lines (396 of
408).

So the patch definitely has some value: it adds coverage of 60+ lines
of code. On the other hand, it still feels to me like we'd be far more
likely to notice new dshash bugs based on its existing usages than
because of this. If I were modifying dshash, I wouldn't run this test
suite first; I'd run the regression tests first. And the patch does
have a cost: it's 334 lines of code that will have to be maintained
going forward. I don't know if that's worth it. I feel like we're a
lot more likely to break dshash behavior in some complicated
concurrency scenario than we are to break it in a test like this. And,
if somebody modifies dshash_destory() or dshash_dump(), they just need
to test the code at least once before committing to get essentially
the same benefit we'd get from this, which you would hope people would
do anyway. None of this is to say that this patch is a horrible idea,
but I'm also not entirely convinced that it is an excellent idea, so I
would like to hear some other opinions.

Of course, there's also the fact that this patch is quite similar to
some other test_* modules that we already have, like test_dsa or
test_bitmapset. So maybe those are good and this is also good. But I'm
not sure. The good news is, if we do want this, it probably doesn't
need much more work to be committable.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

#12

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Robert Haas (#11)

Re: dshash_find_or_insert vs. OOM

You need to get rid of the bare blocks.

done in v4

This verifies that dshash_dump() doesn't crash, but not that it
produces useful output. Whether that's worth the infrastructure, I
don't know.

I don't think dshash_dump() will be deterministic, as keys may hash
to different buckets on different platforms, perhaps. Which is why
I did not think it was a good idea to compare the output from the logs.
Testing that it does not crash seemed worthwhile to increase
the coverage.

408). The omissions are: some OOM cases, dshash_delete_bucket()
successfully deletes something, concurrency case in resizing, most of
delete_key_from_bucket(), delete_item_from_bucket() iterating more

Yes, we can probably address some of these things with more complexity.

than once or returning false. Combining your patch with the test suite
previously mentioned, we get up to 97.1% coverage by lines (396 of
408).

This is solid coverage

So the patch definitely has some value: it adds coverage of 60+ lines
of code. On the other hand, it still feels to me like we'd be far more
likely to notice new dshash bugs based on its existing usages than
because of this. If I were modifying dshash, I wouldn't run this test
suite first; I'd run the regression tests first.

Sure, but once a bug is fixed, having direct test coverage fo this bug
in a dedicated test suite is a good thing.

The NO_OOM flag that started this discussion would not fit neatly in other
test suites, AFAICT.

--
Sami

#13

Andres Freund

andres@anarazel.de

2 months ago

In reply to: Robert Haas (#11)

Re: dshash_find_or_insert vs. OOM

Hi,

On 2026-03-25 13:55:02 -0400, Robert Haas wrote:

So the patch definitely has some value: it adds coverage of 60+ lines
of code. On the other hand, it still feels to me like we'd be far more
likely to notice new dshash bugs based on its existing usages than
because of this. If I were modifying dshash, I wouldn't run this test
suite first; I'd run the regression tests first. And the patch does
have a cost: it's 334 lines of code that will have to be maintained
going forward. I don't know if that's worth it. I feel like we're a
lot more likely to break dshash behavior in some complicated
concurrency scenario than we are to break it in a test like this. And,
if somebody modifies dshash_destory() or dshash_dump(), they just need
to test the code at least once before committing to get essentially
the same benefit we'd get from this, which you would hope people would
do anyway. None of this is to say that this patch is a horrible idea,
but I'm also not entirely convinced that it is an excellent idea, so I
would like to hear some other opinions.

Of course, there's also the fact that this patch is quite similar to
some other test_* modules that we already have, like test_dsa or
test_bitmapset. So maybe those are good and this is also good. But I'm
not sure. The good news is, if we do want this, it probably doesn't
need much more work to be committable.

I think tests like this do have value and I'd definitely run them first while
hacking on code related to dshash, rather than relying on the regression tests
or such. E.g. having test_aio was invaluable to being able to get AIO into a
stable state. When hacking on something with complicated edge cases I'd just
add a test for it, making development faster as well as ensuring the
complicated case continues to work into the future.

However, creating its own test module for small parts of the codebase doesn't
quite make sense to me. A pretty decent chunk of the test is just boilerplate
to add a new module, and every new test module requires its own cluster, which
adds a fair bit of runtime overhead, particularly on windows. I think
test_dsa, test_dsm_registry, test_dshash should just be one combined test
module, they're testing quite closely related code.

Greetings,

Andres Freund

#14

Michael Paquier

michael@paquier.xyz

2 months ago

In reply to: Andres Freund (#13)

Re: dshash_find_or_insert vs. OOM

On Thu, Mar 26, 2026 at 04:26:33PM -0400, Andres Freund wrote:

I think tests like this do have value and I'd definitely run them first while
hacking on code related to dshash, rather than relying on the regression tests
or such. E.g. having test_aio was invaluable to being able to get AIO into a
stable state. When hacking on something with complicated edge cases I'd just
add a test for it, making development faster as well as ensuring the
complicated case continues to work into the future.

These test modules have a lot of value because they are cheap to run
and are very usually able to reproduce edge cases that no other place
of the code tree would be able to reach in a predictible way. Cheap,
fast and reliable is good. On top of that they can serve as code
template. Bonus points.

However, creating its own test module for small parts of the codebase doesn't
quite make sense to me. A pretty decent chunk of the test is just boilerplate
to add a new module, and every new test module requires its own cluster, which
adds a fair bit of runtime overhead, particularly on windows. I think
test_dsa, test_dsm_registry, test_dshash should just be one combined test
module, they're testing quite closely related code.

Yeah, perhaps grouping all the DSA things into a single module would
be OK, with a parallel schedule that would speed up things. It
depends on the complexity and the size of the module to me.

Saying that, I think that the shape of the proposed test_dshash is
wrong: it proposes one SQL function that does a bunch of
dshash-related operations in a single function call, in a random
manner. We have a shared memory state that can survive across SQL
calls, making it a set of thinner SQL function that wrap directly
dshash calls able to manipulate the table would feel much more natural
to me. And it would be easier to design edge cases in the SQL
scripts themselves.
--
Michael

#15

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Michael Paquier (#14)

Re: dshash_find_or_insert vs. OOM

On Thu, Mar 26, 2026 at 04:26:33PM -0400, Andres Freund wrote:

I think tests like this do have value and I'd definitely run them first while
hacking on code related to dshash, rather than relying on the regression tests
or such. E.g. having test_aio was invaluable to being able to get AIO into a
stable state. When hacking on something with complicated edge cases I'd just
add a test for it, making development faster as well as ensuring the
complicated case continues to work into the future.

These test modules have a lot of value because they are cheap to run
and are very usually able to reproduce edge cases that no other place
of the code tree would be able to reach in a predictible way. Cheap,
fast and reliable is good. On top of that they can serve as code
template. Bonus points.

However, creating its own test module for small parts of the codebase doesn't
quite make sense to me. A pretty decent chunk of the test is just boilerplate
to add a new module, and every new test module requires its own cluster, which
adds a fair bit of runtime overhead, particularly on windows. I think
test_dsa, test_dsm_registry, test_dshash should just be one combined test
module, they're testing quite closely related code.

Yeah, perhaps grouping all the DSA things into a single module would
be OK, with a parallel schedule that would speed up things. It
depends on the complexity and the size of the module to me.

Saying that, I think that the shape of the proposed test_dshash is
wrong: it proposes one SQL function that does a bunch of
dshash-related operations in a single function call, in a random
manner. We have a shared memory state that can survive across SQL
calls, making it a set of thinner SQL function that wrap directly
dshash calls able to manipulate the table would feel much more natural
to me. And it would be easier to design edge cases in the SQL
scripts themselves.

My apologies for the late response here. I spent some time looking at
this yesterday and came to the conclusion that we can add dshash tests
to test_dsm_registry, which already allocates a dshash via
GetNamedDSHash(). However, I also realized that the API has a gap: callers
cannot set a size limit on the dshash. I need this for the test, but
more importantly it's a limitation of the API itself. So I plan to
target v20 for the tests, as it's likely too late for v19. To start, I've
submitted a patch for allowing callers to set a size limit on a
GetNamedDSHash()-allocated dshash [1]https://commitfest.postgresql.org/patch/6655/.

[1]: https://commitfest.postgresql.org/patch/6655/

--
Sami

#16

Michael Paquier

michael@paquier.xyz

about 2 months ago

In reply to: Sami Imseih (#15)

Re: dshash_find_or_insert vs. OOM

On Tue, Apr 07, 2026 at 12:00:14PM -0500, Sami Imseih wrote:

My apologies for the late response here. I spent some time looking at
this yesterday and came to the conclusion that we can add dshash tests
to test_dsm_registry, which already allocates a dshash via
GetNamedDSHash(). However, I also realized that the API has a gap: callers
cannot set a size limit on the dshash. I need this for the test, but
more importantly it's a limitation of the API itself.

As in dsa_set_size_limit() cannot be set for a dshash currently.
Indeed that could be independently useful for some use cases.

So I plan to
target v20 for the tests, as it's likely too late for v19. To start, I've
submitted a patch for allowing callers to set a size limit on a
GetNamedDSHash()-allocated dshash [1].

[1] https://commitfest.postgresql.org/patch/6655/

That's too late for v19, at least here. Thanks for pursuing this
work.
--
Michael

dshash_find_or_insert vs. OOM

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: