Small optimization with expanding dynamic hash table

Started by cca55076 months ago9 messages
#1cca5507
cca5507@qq.com
1 attachment(s)

Hi, hackers

If I understand correctly, we only need to check the specific bit to determine whether a hash element needs relocation:

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..32fbae71995 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1626,7 +1626,7 @@ expand_table(HTAB *hashp)
                 currElement = nextElement)
        {
                nextElement = currElement->link;
-               if ((long) calc_bucket(hctl, currElement->hashvalue) == old_bucket)
+               if (!(currElement->hashvalue & (hctl->low_mask + 1)))
                {
                        *oldlink = currElement;
                        oldlink = &currElement->link;

--
Regards,
ChangAo Chen

Attachments:

v1-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchapplication/octet-stream; charset=ISO-8859-1; name=v1-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchDownload
From a64ad2e73869ce6d1369a584e634f6c9e4af6b7c Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Mon, 7 Jul 2025 13:05:34 +0800
Subject: [PATCH v1] Small optimization with expanding dynamic hash table

---
 src/backend/utils/hash/dynahash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..32fbae71995 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1626,7 +1626,7 @@ expand_table(HTAB *hashp)
 		 currElement = nextElement)
 	{
 		nextElement = currElement->link;
-		if ((long) calc_bucket(hctl, currElement->hashvalue) == old_bucket)
+		if (!(currElement->hashvalue & (hctl->low_mask + 1)))
 		{
 			*oldlink = currElement;
 			oldlink = &currElement->link;
-- 
2.34.1

#2Rahila Syed
rahilasyed90@gmail.com
In reply to: cca5507 (#1)
Re: Small optimization with expanding dynamic hash table

Hi,

If I understand correctly, we only need to check the specific bit
to determine whether a hash element needs relocation:

diff --git a/src/backend/utils/hash/dynahash.c
b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..32fbae71995 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1626,7 +1626,7 @@ expand_table(HTAB *hashp)
currElement = nextElement)
{
nextElement = currElement->link;
-               if ((long) calc_bucket(hctl, currElement->hashvalue) ==
old_bucket)
+               if (!(currElement->hashvalue & (hctl->low_mask + 1)))
{
*oldlink = currElement;
oldlink = &currElement->link;

Will this still work if new_bucket is not equal to hctl->low_mask + 1?

The above situation seems possible because we increment new_bucket every
time expand_table is called,
but we only update low_mask when new_bucket exceeds high_mask.

This change has successfully passed the tests on Github CI. According to
[1]: LCOV - PostgreSQL 19devel - src/backend/utils/hash/dynahash.c <https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html&gt;
but I'm not certain if the specific case mentioned above is included in the
tests.

[1]: LCOV - PostgreSQL 19devel - src/backend/utils/hash/dynahash.c <https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html&gt;
<https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html&gt;

Thank you,
Rahila Syed

#3cca5507
cca5507@qq.com
In reply to: Rahila Syed (#2)
Re: Small optimization with expanding dynamic hash table

&gt;&nbsp;Will this still work if new_bucket is not equal to hctl-&gt;low_mask + 1?Yes, for example:

low_mask: 0x011, high_mask: 0x111, old_bucket: 0x010, new_bucket: 0x110

The old_bucket's hash value like 0x***010 or 0x***110, the later is in the old_bucket is because we didn't have new_bucket before, so only hash value like 0x***110 needs&nbsp;relocation: hashvalue &amp; (low_mask + 1) != 0

--
Regards,
ChangAo Chen

#4Rahila Syed
rahilasyed90@gmail.com
In reply to: cca5507 (#3)
Re: Small optimization with expanding dynamic hash table

Hi,

Yes, for example:

low_mask: 0x011, high_mask: 0x111, old_bucket: 0x010, new_bucket: 0x110

The old_bucket's hash value like 0x***010 or 0x***110, the later is in the
old_bucket is because we didn't have new_bucket before, so only hash value
like 0x***110 needs relocation: hashvalue & (low_mask + 1) != 0

Thanks for explaining, that clarifies things for me.
It may be worthwhile to check if this change has led to any performance
improvements.

Thank you,
Rahila syed

#5Rahila Syed
rahilasyed90@gmail.com
In reply to: Rahila Syed (#4)
Re: Small optimization with expanding dynamic hash table

Hi,

Yes, for example:

low_mask: 0x011, high_mask: 0x111, old_bucket: 0x010, new_bucket: 0x110

The old_bucket's hash value like 0x***010 or 0x***110, the later is in
the old_bucket is because we didn't have new_bucket before, so only hash
value like 0x***110 needs relocation: hashvalue & (low_mask + 1) != 0

Thanks for explaining, that clarifies things for me.
It may be worthwhile to check if this change has led to any performance
improvements.

One thing to note is that in this scenario, there is no safeguard if the
hashvalue is 0x111 and new_bucket is 0x110.
This means max_bucket is 0x110, but with your change, even 0x111 would meet
the condition for relocation
to the new bucket, which would be incorrect.
Although it’s unlikely that 0x111 would be passed into this check, if it is
passed, there is currently no check to
prevent it from being relocated to the new bucket. In the current code,
however, we do have such a check in
place in calc_bucket, specifically: if (bucket > hctl->max_bucket)

Thank you,
Rahila Syed

#6cca5507
cca5507@qq.com
In reply to: Rahila Syed (#5)
Re: Small optimization with expanding dynamic hash table

&gt;&nbsp;One thing to note is that in this scenario, there is no safeguard if the&nbsp;hashvalue is 0x111 and new_bucket is 0x110.

But the hash table is already&nbsp;corrupted if the hashvalue 0x111 in old_bucket 0x010, all hashvalue in old_bucket should have: hashvalue &amp; low_mask == old_bucket's no.

--
Regards,
ChangAo Chen

#7cca5507
cca5507@qq.com
In reply to: cca5507 (#6)
1 attachment(s)
Re: Small optimization with expanding dynamic hash table

Hi,

The v2 patch maybe more clear:

We can calc bucket just by hashvalue &amp; high_mask when expanding table because the if condition in calc_bucket() must be false.

--
Regards,
ChangAo Chen

Attachments:

v2-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchapplication/octet-stream; charset=ISO-8859-1; name=v2-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchDownload
From ca100c8b9d98a90bf82d0b12d418e703ccc4208e Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Thu, 10 Jul 2025 10:32:16 +0800
Subject: [PATCH v2] Small optimization with expanding dynamic hash table

---
 src/backend/utils/hash/dynahash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..475da6add1c 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1626,7 +1626,7 @@ expand_table(HTAB *hashp)
 		 currElement = nextElement)
 	{
 		nextElement = currElement->link;
-		if ((long) calc_bucket(hctl, currElement->hashvalue) == old_bucket)
+		if ((long) (currElement->hashvalue & hctl->high_mask) == old_bucket)
 		{
 			*oldlink = currElement;
 			oldlink = &currElement->link;
-- 
2.34.1

#8wenhui qiu
qiuwenhuifx@gmail.com
In reply to: cca5507 (#7)
Re: Small optimization with expanding dynamic hash table

Hi

The v2 patch maybe more clear:
We can calc bucket just by hashvalue & high_mask when expanding table

because the if condition in calc_bucket() must be false.
I think you may add a comment to this path so that code reviewers can
clearly see your optimization.

Thanks

On Thu, Jul 10, 2025 at 10:46 AM cca5507 <cca5507@qq.com> wrote:

Show quoted text

Hi,

The v2 patch maybe more clear:

We can calc bucket just by hashvalue & high_mask when expanding table
because the if condition in calc_bucket() must be false.

--
Regards,
ChangAo Chen

#9cca5507
cca5507@qq.com
In reply to: wenhui qiu (#8)
1 attachment(s)
Re: Small optimization with expanding dynamic hash table

Hi,

The v3 patch attached. (Add some comment in commit msg)

--
Regards,
ChangAo Chen

Attachments:

v3-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchapplication/octet-stream; charset=ISO-8859-1; name=v3-0001-Small-optimization-with-expanding-dynamic-hash-ta.patchDownload
From f79b8beafae2b0fa462f27d02d735313b15fb351 Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Fri, 11 Jul 2025 19:58:44 +0800
Subject: [PATCH v3] Small optimization with expanding dynamic hash table

When expanding table, all hashvalues in the old bucket must have
'hashvalue & high_mask' equal to the old bucket's no or the new
bucket's no. We can calc bucket just by 'hashvalue & high_mask'
because the if condition in calc_bucket() must be false.
(old bucket's no < new bucket's no = max bucket's no)
---
 src/backend/utils/hash/dynahash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1ad155d446e..475da6add1c 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1626,7 +1626,7 @@ expand_table(HTAB *hashp)
 		 currElement = nextElement)
 	{
 		nextElement = currElement->link;
-		if ((long) calc_bucket(hctl, currElement->hashvalue) == old_bucket)
+		if ((long) (currElement->hashvalue & hctl->high_mask) == old_bucket)
 		{
 			*oldlink = currElement;
 			oldlink = &currElement->link;
-- 
2.34.1