Compilation issues for HASH_STATISTICS and HASH_DEBUG options

Started by Hayato Kuroda (Fujitsu)5 months ago30 messages
#1Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
1 attachment(s)

Dear hackers,

I found that postgres could not be built if a complier option HASH_STATISTICS is
set [1]``` dynahash.c: In function ‘hash_update_hash_key’: dynahash.c:1178:9: error: ‘hctl’ undeclared (first use in this function) 1178 | hctl->accesses++; | ^~~~ ```. Also, I found HASH_DEBUG option caused warnings [2]``` dynahash.c: In function ‘init_htab’: dynahash.c:779:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 12 has type ‘uint32’ {aka ‘unsigned int’} [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~~^ | | | long int | %d ...... 784 | "MAX BUCKET ", hctl->max_bucket, | ~~~~~~~~~~~~~~~~ | | | dynahash.c:779:86: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 18 has type ‘long int’ [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~^ | | | unsigned in | %lx ...... 787 | "NSEGS ", hctl->nsegs); | ~~~~~~~~~~~ | | | long int dynahash.c:779:90: warning: format ‘%s’ expects a matching ‘char *’ argument [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~^ | | | char *. Usage of the are
mentioned at the code comments in dynahash.c.

I'm not sure whether we would keep supporting them because no one may not have
used anymore now.

Anyway, I tried to fix the error/warnings. Please see attached.

[1]: ``` dynahash.c: In function ‘hash_update_hash_key’: dynahash.c:1178:9: error: ‘hctl’ undeclared (first use in this function) 1178 | hctl->accesses++; | ^~~~ ```
```
dynahash.c: In function ‘hash_update_hash_key’:
dynahash.c:1178:9: error: ‘hctl’ undeclared (first use in this function)
1178 | hctl->accesses++;
| ^~~~
```

[2]: ``` dynahash.c: In function ‘init_htab’: dynahash.c:779:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 12 has type ‘uint32’ {aka ‘unsigned int’} [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~~^ | | | long int | %d ...... 784 | "MAX BUCKET ", hctl->max_bucket, | ~~~~~~~~~~~~~~~~ | | | dynahash.c:779:86: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 18 has type ‘long int’ [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~^ | | | unsigned in | %lx ...... 787 | "NSEGS ", hctl->nsegs); | ~~~~~~~~~~~ | | | long int dynahash.c:779:90: warning: format ‘%s’ expects a matching ‘char *’ argument [-Wformat=] 779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n", | ~^ | | | char *
```
dynahash.c: In function ‘init_htab’:
dynahash.c:779:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 12 has type ‘uint32’ {aka ‘unsigned int’} [-Wformat=]
779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
| ~~^
| |
| long int
| %d
......
784 | "MAX BUCKET ", hctl->max_bucket,
| ~~~~~~~~~~~~~~~~
| |
|
dynahash.c:779:86: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 18 has type ‘long int’ [-Wformat=]
779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
| ~^
| |
| unsigned in
| %lx
......
787 | "NSEGS ", hctl->nsegs);
| ~~~~~~~~~~~
| |
| long int
dynahash.c:779:90: warning: format ‘%s’ expects a matching ‘char *’ argument [-Wformat=]
779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
| ~^
| |
| char *

dynahash.c:779:93: warning: format ‘%ld’ expects a matching ‘long int’ argument [-Wformat=]
779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
| ~~^
| |
| long int
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-Fix-broken-code-for-HASH_DEBUG-and-HASH_STATISTICS.patchapplication/octet-stream; name=0001-Fix-broken-code-for-HASH_DEBUG-and-HASH_STATISTICS.patchDownload
From 35ccd7bc669805e828bfd4affa411e6733fcdb0a Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Aug 2025 20:13:18 +0900
Subject: [PATCH] Fix broken code for HASH_DEBUG and HASH_STATISTICS

---
 src/backend/utils/hash/dynahash.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 81da03629f0..5d540bbae84 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -776,7 +776,7 @@ init_htab(HTAB *hashp, long nelem)
 	hctl->nelem_alloc = choose_nelem_alloc(hctl->entrysize);
 
 #ifdef HASH_DEBUG
-	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
+	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%u\n%s%u\n%s%x\n%s%ld\n",
 			"TABLE POINTER   ", hashp,
 			"DIRECTORY SIZE  ", hctl->dsize,
 			"SEGMENT SIZE    ", hctl->ssize,
@@ -1174,6 +1174,8 @@ hash_update_hash_key(HTAB *hashp,
 	HashCompareFunc match;
 
 #ifdef HASH_STATISTICS
+	HASHHDR    *hctl = hashp->hctl;
+
 	hash_accesses++;
 	hctl->accesses++;
 #endif
-- 
2.47.1

#2David Rowley
dgrowleyml@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#1)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Thu, 14 Aug 2025 at 23:48, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

I found that postgres could not be built if a complier option HASH_STATISTICS is
set [1]. Also, I found HASH_DEBUG option caused warnings [2]. Usage of the are
mentioned at the code comments in dynahash.c.

[1]:
```
dynahash.c: In function ‘hash_update_hash_key’:
dynahash.c:1178:9: error: ‘hctl’ undeclared (first use in this function)
1178 | hctl->accesses++;
| ^~~~
```

Looks to be new to PG17 introduced in cc5ef90ed.

[2]:
```
dynahash.c: In function ‘init_htab’:
dynahash.c:779:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 12 has type ‘uint32’ {aka ‘unsigned int’} [-Wformat=]
779 | fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",

This dates back to PG14 from be0a66666. Looks like the author
miscounted the parameter in the format string and deleted starting 1
parameter too soon.

I'll address these and backpatch once the freeze is lifted from the
backbranches. That should be quite soon.

Thanks

David

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hayato Kuroda (Fujitsu) (#1)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> writes:

I found that postgres could not be built if a complier option HASH_STATISTICS is
set [1]. Also, I found HASH_DEBUG option caused warnings [2]. Usage of the are
mentioned at the code comments in dynahash.c.

I'm not sure whether we would keep supporting them because no one may not have
used anymore now.

Ugh. I did a bit of "git bisect" work and discovered that:

1. HASH_DEBUG got broken at

commit 44ca4022f3f9297bab5cbffdd97973dbba1879ed
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Mar 23 10:56:23 2016 -0400

Partition the freelist for shared dynahash tables.

and was fixed again at

commit 9d4e56699957b261390c50dcda7f947470017484
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Aug 2 12:16:50 2017 -0400

Remove broken and useless entry-count printing in HASH_DEBUG code.

and then broke again at

commit be0a6666656ec3f68eb7d8e7abab5139fcd47012
Author: Thomas Munro <tmunro@postgresql.org>
Date: Sat Sep 19 11:28:34 2020 +1200

Remove large fill factor support from dynahash.c.

2. HASH_STATISTICS got broken at

commit cc5ef90edd809eaf85e11a0ee251229bbf7ce798
Author: Michael Paquier <michael@paquier.xyz>
Date: Fri Mar 15 07:57:17 2024 +0900

Refactor initial hash lookup in dynahash.c

Based on this, I think we should remove the HASH_DEBUG support.
It's been broken for six of the last nine years and only one
person ever noticed. Moreover, if you were trying to find a
problem in dynahash, you'd probably want different debug logging
than what's here.

There might be a case for removing HASH_STATISTICS too, but
it seems weaker. I do recall having made use of that code
once years ago ...

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

David Rowley <dgrowleyml@gmail.com> writes:

I'll address these and backpatch once the freeze is lifted from the
backbranches. That should be quite soon.

The freeze was over when we tagged the releases, a day and a half
ago now.

As I just responded to Hayato-san, I think there's a good case
for removing the HASH_DEBUG code rather than fixing it (again).

regards, tom lane

#5David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#4)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, 15 Aug 2025 at 02:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As I just responded to Hayato-san, I think there's a good case
for removing the HASH_DEBUG code rather than fixing it (again).

No objections here. It feels like it must not get used very often if
it's taken almost 5 years for someone to notice the warning.

The reason I thought to backpatch the fix was that I was less sure
about removing it in backbranches and thought that should just be
fixed and any consideration to remove it would be in master only.

David

#6Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Tom Lane (#3)
2 attachment(s)
RE: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

Dear Tom,

Thanks for the analysis.

Based on this, I think we should remove the HASH_DEBUG support.
It's been broken for six of the last nine years and only one
person ever noticed. Moreover, if you were trying to find a
problem in dynahash, you'd probably want different debug logging
than what's here.

I didn't realize that it has been broken for a long time. Agreed to remove
the HASH_DEBUG option.

There might be a case for removing HASH_STATISTICS too, but
it seems weaker. I do recall having made use of that code
once years ago ...

I have never used both options, but one point is that hash_stats() has been
exported and extensions have called it. Is there a possibility that hidden
developer is using?

Anyway, I have no strong opinion. Attached patch does 1) remove HASH_DEBUG and
2) fix broken code with HASH_STATISTICS.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0001-Remove-HASH_DEBUG.patchapplication/octet-stream; name=v2-0001-Remove-HASH_DEBUG.patchDownload
From 2959128aa48471c2a200a45b6e39c82c20ca49cc Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 15 Aug 2025 11:59:56 +0900
Subject: [PATCH v2 1/2] Remove HASH_DEBUG

---
 src/backend/utils/hash/dynahash.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 81da03629f0..9eca38efd8b 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -80,9 +80,7 @@
  * are not implemented; otherwise functionality is identical.
  *
  * Compilation controls:
- * HASH_DEBUG controls some informative traces, mainly for debugging.
- * HASH_STATISTICS causes HashAccesses and HashCollisions to be maintained;
- * when combined with HASH_DEBUG, these are displayed by hdestroy().
+ * HASH_STATISTICS causes HashAccesses and HashCollisions to be maintained.
  *
  * Problems & fixes to ejp@ausmelb.oz. WARNING: relies on pre-processor
  * concatenation property, in probably unnecessary code 'optimization'.
@@ -775,17 +773,6 @@ init_htab(HTAB *hashp, long nelem)
 	/* Choose number of entries to allocate at a time */
 	hctl->nelem_alloc = choose_nelem_alloc(hctl->entrysize);
 
-#ifdef HASH_DEBUG
-	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%ld\n%s%u\n%s%x\n%s%x\n%s%ld\n",
-			"TABLE POINTER   ", hashp,
-			"DIRECTORY SIZE  ", hctl->dsize,
-			"SEGMENT SIZE    ", hctl->ssize,
-			"SEGMENT SHIFT   ", hctl->sshift,
-			"MAX BUCKET      ", hctl->max_bucket,
-			"HIGH MASK       ", hctl->high_mask,
-			"LOW  MASK       ", hctl->low_mask,
-			"NSEGS           ", hctl->nsegs);
-#endif
 	return true;
 }
 
-- 
2.47.1

v2-0002-Fix-broken-code-for-HASH_STATISTICS.patchapplication/octet-stream; name=v2-0002-Fix-broken-code-for-HASH_STATISTICS.patchDownload
From a2efd0208c665467026f1a2a8b3a7bb642842cc8 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Thu, 14 Aug 2025 20:13:18 +0900
Subject: [PATCH v2 2/2] Fix broken code for HASH_STATISTICS

---
 src/backend/utils/hash/dynahash.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 9eca38efd8b..f5d287372a9 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1161,6 +1161,8 @@ hash_update_hash_key(HTAB *hashp,
 	HashCompareFunc match;
 
 #ifdef HASH_STATISTICS
+	HASHHDR    *hctl = hashp->hctl;
+
 	hash_accesses++;
 	hctl->accesses++;
 #endif
-- 
2.47.1

#7Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#6)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, Aug 15, 2025 at 03:22:18AM +0000, Hayato Kuroda (Fujitsu) wrote:

Based on this, I think we should remove the HASH_DEBUG support.
It's been broken for six of the last nine years and only one
person ever noticed. Moreover, if you were trying to find a
problem in dynahash, you'd probably want different debug logging
than what's here.

I didn't realize that it has been broken for a long time. Agreed to remove
the HASH_DEBUG option.

One use case where I've been recently relying on dynahash is the
pgstats shmem code. While I've not used these symbols, I'm pretty
sure that they could prove useful now that I know about them. Andres,
an opinion perhaps?

I have never used both options, but one point is that hash_stats() has been
exported and extensions have called it. Is there a possibility that hidden
developer is using?

Anyway, I have no strong opinion. Attached patch does 1) remove HASH_DEBUG and
2) fix broken code with HASH_STATISTICS.

It looks like I'm responsible for breaking one of them, at least,
sorry for that. As far as I can see, the fixes are not complicated,
so I'd rather fix and backpatch things rather than removing both as
both combined can be quite powerful when debugging or improving this
code. David, were you planning to do so? I'd vote for keeping these
working.
--
Michael

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hayato Kuroda (Fujitsu) (#6)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> writes:

There might be a case for removing HASH_STATISTICS too, but
it seems weaker. I do recall having made use of that code
once years ago ...

I have never used both options, but one point is that hash_stats() has been
exported and extensions have called it. Is there a possibility that hidden
developer is using?

Seems unlikely. They couldn't count on it doing anything at all
in a standard build. Moreover, even if HASH_STATISTICS is defined,
what it will do is print to stderr which is hardly a production-
friendly behavior. So on the whole I'd bet lunch that no one
depends on hash_stats().

In any case, we'd be talking about a change in master only,
not something we'd back-patch. So there would be time for
people to complain if anyone really cares.

Anyway, I have no strong opinion. Attached patch does 1) remove HASH_DEBUG and
2) fix broken code with HASH_STATISTICS.

Whatever we do with this, let's do it in two separate patches,
just in case someone argues for reverting one or the other.
The issues don't seem to me to be fundamentally connected.

regards, tom lane

#9David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#7)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, 15 Aug 2025 at 15:42, Michael Paquier <michael@paquier.xyz> wrote:

It looks like I'm responsible for breaking one of them, at least,
sorry for that. As far as I can see, the fixes are not complicated,
so I'd rather fix and backpatch things rather than removing both as
both combined can be quite powerful when debugging or improving this
code. David, were you planning to do so? I'd vote for keeping these
working.

Yes, I'm about to push the HASH_STATISTICS one and backpatch to v17.

I think we should fix and backpatch the HASH_DEBUG one. We can have a
separate debate on removing it in master only. It's hard to imagine
anyone objecting to fixing it, so let's just do that for the time
being.

FWIW, I've no personal need to keep HASH_DEBUG, but if someone does,
then let's keep it. Maybe we can make it use elog(DEBUG<N>) rather
than fprintf and at least build with it in some BF member so we notice
sooner if someone breaks it again. (I've not checked if there's a good
reason why we can't use elog(). Perhaps something dynahash related
happens to early in backend startup...)

David

#10Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#9)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, Aug 15, 2025 at 04:48:10PM +1200, David Rowley wrote:

Yes, I'm about to push the HASH_STATISTICS one and backpatch to v17.

Thanks.

I think we should fix and backpatch the HASH_DEBUG one. We can have a
separate debate on removing it in master only. It's hard to imagine
anyone objecting to fixing it, so let's just do that for the time
being.

Sounds good to me.
--
Michael

#11Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: David Rowley (#9)
RE: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

Dear David,

FWIW, I've no personal need to keep HASH_DEBUG, but if someone does,
then let's keep it. Maybe we can make it use elog(DEBUG<N>) rather
than fprintf and at least build with it in some BF member so we notice
sooner if someone breaks it again. (I've not checked if there's a good
reason why we can't use elog(). Perhaps something dynahash related
happens to early in backend startup...)

Just in case: actually, even if the HASH_DEBUG part is fixed on PG17, it could
not pass some tests. One example is initdb test [1]``` [13:55:38.836](0.001s) not ok 29 - options --locale-provider=icu --locale=und --lc-*=C: no stderr [13:55:38.836](0.000s) [13:55:38.836](0.000s) # Failed test 'options --locale-provider=icu --locale=und --lc-*=C: no stderr' # at t/001_initdb.pl line 131. [13:55:38.837](0.001s) # got: 'init_htab: # TABLE POINTER 0x180ee70 # DIRECTORY SIZE 256 # SEGMENT SIZE 256 # SEGMENT SHIFT 8 # MAX BUCKET 3 # HIGH MASK 7 # LOW MASK 3 # NSEGS 1 ... ```.
ISTM, command_like() assumed that there are no outputs in stderr but this option
does. This meant no BF animals cannot set this option as-is.
After I changed them to elog(DEBUG1) (and set debug1 as default) I ran tests
under src/test, and they could pass.

[1]: ``` [13:55:38.836](0.001s) not ok 29 - options --locale-provider=icu --locale=und --lc-*=C: no stderr [13:55:38.836](0.000s) [13:55:38.836](0.000s) # Failed test 'options --locale-provider=icu --locale=und --lc-*=C: no stderr' # at t/001_initdb.pl line 131. [13:55:38.837](0.001s) # got: 'init_htab: # TABLE POINTER 0x180ee70 # DIRECTORY SIZE 256 # SEGMENT SIZE 256 # SEGMENT SHIFT 8 # MAX BUCKET 3 # HIGH MASK 7 # LOW MASK 3 # NSEGS 1 ... ```
```
[13:55:38.836](0.001s) not ok 29 - options --locale-provider=icu --locale=und --lc-*=C: no stderr
[13:55:38.836](0.000s)
[13:55:38.836](0.000s) # Failed test 'options --locale-provider=icu --locale=und --lc-*=C: no stderr'
# at t/001_initdb.pl line 131.
[13:55:38.837](0.001s) # got: 'init_htab:
# TABLE POINTER 0x180ee70
# DIRECTORY SIZE 256
# SEGMENT SIZE 256
# SEGMENT SHIFT 8
# MAX BUCKET 3
# HIGH MASK 7
# LOW MASK 3
# NSEGS 1
...
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#12David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#10)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, 15 Aug 2025 at 17:25, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 15, 2025 at 04:48:10PM +1200, David Rowley wrote:

Yes, I'm about to push the HASH_STATISTICS one and backpatch to v17.

Thanks.

Done

I think we should fix and backpatch the HASH_DEBUG one. We can have a
separate debate on removing it in master only. It's hard to imagine
anyone objecting to fixing it, so let's just do that for the time
being.

Sounds good to me.

Also done. (I didn't use the proposed patch to fix that as the
proposed format wasn't correct. HIGH MASK was being changed to use %u
instead of %x, which I thought was unintended.)

David

#13David Rowley
dgrowleyml@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#11)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, 15 Aug 2025 at 17:46, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Just in case: actually, even if the HASH_DEBUG part is fixed on PG17, it could
not pass some tests. One example is initdb test [1].
ISTM, command_like() assumed that there are no outputs in stderr but this option
does. This meant no BF animals cannot set this option as-is.
After I changed them to elog(DEBUG1) (and set debug1 as default) I ran tests
under src/test, and they could pass.

Yeah, I noticed the tests wouldn't pass with it enabled. I think now
that the format bug is fixed in the back branches, we can decide what
we're going to do in master.

I think if we're going to keep HASH_DEBUG, then we should try and do
something to help ensure it does not get broken again for an excessive
period of time. I'm not against switching to elog(DEBUG), but then I
don't have any idea why it was outputting to stderr. If someone wants
to switch their BF animal to define HASH_DEBUG then maybe we can
commit the elog(DEBUG) change to master only and see if anyone
complains. (I suspect nobody will).

I think the votes to keep it should outweigh the votes to get rid of
it, as someone voting to keep it probably has some idea that it'll be
useful to them, and keeping it shouldn't really hinder the people who
don't want it. I think Michael has voted to keep it.

Michael, any thoughts on switching from stderr to elog(DEBUG)? And
also about defining HASH_DEBUG on one of your BF members?

David

#14Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#13)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, Aug 15, 2025 at 06:37:44PM +1200, David Rowley wrote:

I think the votes to keep it should outweigh the votes to get rid of
it, as someone voting to keep it probably has some idea that it'll be
useful to them, and keeping it shouldn't really hinder the people who
don't want it. I think Michael has voted to keep it.

Michael, any thoughts on switching from stderr to elog(DEBUG)?

You mean to do that for both flags, right? That would be OK for me.

And also about defining HASH_DEBUG on one of your BF members?

Sure, I'll do that for both HASH_DEBUG and HASH_STATISTICS if we
decide to keep both and that a fix is applied. We've proven to be bad
at breaking the two of them.
--
Michael

#15David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#14)
1 attachment(s)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Fri, 15 Aug 2025 at 19:09, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 15, 2025 at 06:37:44PM +1200, David Rowley wrote:

Michael, any thoughts on switching from stderr to elog(DEBUG)?

You mean to do that for both flags, right? That would be OK for me.

Yeah. Just to aid discussion, let's say the attached patch.

I don't really have an idea on which debug level these should appear
on, however. I picked DEBUG4 for no other reason than DEBUG5 being
quite verbose with AIO related messages.

David

Attachments:

hash_debug.patchapplication/octet-stream; name=hash_debug.patchDownload
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index d0843551007..91cd5dfbf94 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -776,15 +776,15 @@ init_htab(HTAB *hashp, long nelem)
 	hctl->nelem_alloc = choose_nelem_alloc(hctl->entrysize);
 
 #ifdef HASH_DEBUG
-	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%u\n%s%x\n%s%x\n%s%ld\n",
-			"TABLE POINTER   ", hashp,
-			"DIRECTORY SIZE  ", hctl->dsize,
-			"SEGMENT SIZE    ", hctl->ssize,
-			"SEGMENT SHIFT   ", hctl->sshift,
-			"MAX BUCKET      ", hctl->max_bucket,
-			"HIGH MASK       ", hctl->high_mask,
-			"LOW  MASK       ", hctl->low_mask,
-			"NSEGS           ", hctl->nsegs);
+	elog(DEBUG4, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%u\n%s%x\n%s%x\n%s%ld\n",
+		 "TABLE POINTER   ", hashp,
+		 "DIRECTORY SIZE  ", hctl->dsize,
+		 "SEGMENT SIZE    ", hctl->ssize,
+		 "SEGMENT SHIFT   ", hctl->sshift,
+		 "MAX BUCKET      ", hctl->max_bucket,
+		 "HIGH MASK       ", hctl->high_mask,
+		 "LOW  MASK       ", hctl->low_mask,
+		 "NSEGS           ", hctl->nsegs);
 #endif
 	return true;
 }
@@ -901,16 +901,14 @@ void
 hash_stats(const char *where, HTAB *hashp)
 {
 #ifdef HASH_STATISTICS
-	fprintf(stderr, "%s: this HTAB -- accesses %ld collisions %ld\n",
-			where, hashp->hctl->accesses, hashp->hctl->collisions);
-
-	fprintf(stderr, "hash_stats: entries %ld keysize %ld maxp %u segmentcount %ld\n",
-			hash_get_num_entries(hashp), (long) hashp->hctl->keysize,
-			hashp->hctl->max_bucket, hashp->hctl->nsegs);
-	fprintf(stderr, "%s: total accesses %ld total collisions %ld\n",
-			where, hash_accesses, hash_collisions);
-	fprintf(stderr, "hash_stats: total expansions %ld\n",
-			hash_expansions);
+	elog(DEBUG4, "%s: this HTAB -- accesses %ld collisions %ld\n", where,
+		 hashp->hctl->accesses, hashp->hctl->collisions);
+	elog(DEBUG4, "hash_stats: entries %ld keysize %ld maxp %u segmentcount %ld\n",
+		 hash_get_num_entries(hashp), (long) hashp->hctl->keysize,
+		 hashp->hctl->max_bucket, hashp->hctl->nsegs);
+	elog(DEBUG4, "%s: total accesses %ld total collisions %ld\n", where,
+		 hash_accesses, hash_collisions);
+	elog(DEBUG4, "hash_stats: total expansions %ld\n", hash_expansions);
 #endif
 }
 
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#15)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

David Rowley <dgrowleyml@gmail.com> writes:

Yeah. Just to aid discussion, let's say the attached patch.

I think if we want to claim that these bits represent actually
usable functionality, we need to do more than s/fprintf/elog/.
Some points:

* Neither printout identifies which hashtable it's talking about
in any usable fashion, which is silly when we could print
hashp->tabname. HASH_DEBUG prints the pointer to the table,
which is certainly useless unless you've got gdb attached to
the session, and probably useless even then without the tabname.

* The output formats are randomly inconsistent with each other,
and don't look much like other outputs either. I'm particularly
vexed by the fact that you could not usefully grep the log for
this data. I think we should switch them to a single log line so
that grepping for a hashtable name could produce something useful.

* As it stands, HASH_DEBUG prints only at hashtable creation and
HASH_STATISTICS prints only at table destruction. Is that really
enough? I'm thinking in particular of shared and session-lifespan
hashtables, which won't ever receive a hash_destroy call.

* One fairly believable use-case for hash_stats() is to be called
manually from a debugger. To make that a little easier, I think
we should fix it to not crash if "where" is NULL.

I don't really have an idea on which debug level these should appear
on, however. I picked DEBUG4 for no other reason than DEBUG5 being
quite verbose with AIO related messages.

No opinion here either.

regards, tom lane

#17David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#16)
1 attachment(s)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Sat, 16 Aug 2025 at 03:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Neither printout identifies which hashtable it's talking about
in any usable fashion, which is silly when we could print
hashp->tabname. HASH_DEBUG prints the pointer to the table,
which is certainly useless unless you've got gdb attached to
the session, and probably useless even then without the tabname.

* The output formats are randomly inconsistent with each other,
and don't look much like other outputs either. I'm particularly
vexed by the fact that you could not usefully grep the log for
this data. I think we should switch them to a single log line so
that grepping for a hashtable name could produce something useful.

* As it stands, HASH_DEBUG prints only at hashtable creation and
HASH_STATISTICS prints only at table destruction. Is that really
enough? I'm thinking in particular of shared and session-lifespan
hashtables, which won't ever receive a hash_destroy call.

* One fairly believable use-case for hash_stats() is to be called
manually from a debugger. To make that a little easier, I think
we should fix it to not crash if "where" is NULL.

I've attached another patch with all these things fixed up.

For the single line format, I made this use a similar format to how we
display various properties in text based EXPLAIN.

I noticed that the existing code has a set of global variables that
keeps track of accesses, collisions and expansions for *all* tables in
the backend. This didn't fit well with the single line elog. I kinda
though the global information was a bit strange, so I just got rid of
it. There was no "expansions" field to track the number of expansions
for a single table, so I added one. I made all these uint64. I felt
long (which can be 32-bit on some platforms) was too small for
tracking the number of hash table accesses. Likely uint64 is too wide
to track the expansions. That likely could be made smaller, but since
these are not enabled by default, they're not taking up any struct
space in normal builds.

This is from using the debugger:
2025-08-17 00:04:52.206 NZST [962296] DEBUG: hash_stats: Caller:
(unknown) Table Name: "Relcache by OID" Accesses: 194 Collisions:
23 Expansions: 0 Entries: 139 Key Size: 4 Max Bucket: 511 Segment
Count: 2

and some samples from the log:
2025-08-17 00:05:13.241 NZST [962325] DEBUG: init_htab: Table Name:
"Relcache by OID" Directory Size: 256 Segment Size: 256 Segment
Shift: 8 Max Bucket: 511 High Mask: 3ff Low Mask: 1ff Number
Segments: 2
2025-08-17 00:05:13.241 NZST [962325] DEBUG: init_htab: Table Name:
"Portal hash" Directory Size: 256 Segment Size: 256 Segment Shift:
8 Max Bucket: 15 High Mask: 1f Low Mask: f Number Segments: 1
2025-08-17 00:05:13.241 NZST [962325] DEBUG: init_htab: Table Name:
"smgr relation table" Directory Size: 256 Segment Size: 256 Segment
Shift: 8 Max Bucket: 511 High Mask: 3ff Low Mask: 1ff Number
Segments: 2
2025-08-17 00:05:13.243 NZST [962325] DEBUG: init_htab: Table Name:
"Operator class cache" Directory Size: 256 Segment Size: 256
Segment Shift: 8 Max Bucket: 63 High Mask: 7f Low Mask: 3f Number
Segments: 1

David

Attachments:

hash_debug_v2.patchapplication/octet-stream; name=hash_debug_v2.patchDownload
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index d0843551007..a10415feae0 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -205,8 +205,9 @@ struct HASHHDR
 	 * Count statistics here.  NB: stats code doesn't bother with mutex, so
 	 * counts could be corrupted a bit in a partitioned table.
 	 */
-	long		accesses;
-	long		collisions;
+	uint64		accesses;
+	uint64		collisions;
+	uint64		expansions;
 #endif
 };
 
@@ -266,12 +267,6 @@ struct HTAB
  */
 #define MOD(x,y)			   ((x) & ((y)-1))
 
-#ifdef HASH_STATISTICS
-static long hash_accesses,
-			hash_collisions,
-			hash_expansions;
-#endif
-
 /*
  * Private function prototypes
  */
@@ -661,7 +656,7 @@ hdefault(HTAB *hashp)
 	hctl->isfixed = false;		/* can be enlarged */
 
 #ifdef HASH_STATISTICS
-	hctl->accesses = hctl->collisions = 0;
+	hctl->accesses = hctl->collisions = hctl->expansions = 0;
 #endif
 }
 
@@ -776,15 +771,9 @@ init_htab(HTAB *hashp, long nelem)
 	hctl->nelem_alloc = choose_nelem_alloc(hctl->entrysize);
 
 #ifdef HASH_DEBUG
-	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%u\n%s%x\n%s%x\n%s%ld\n",
-			"TABLE POINTER   ", hashp,
-			"DIRECTORY SIZE  ", hctl->dsize,
-			"SEGMENT SIZE    ", hctl->ssize,
-			"SEGMENT SHIFT   ", hctl->sshift,
-			"MAX BUCKET      ", hctl->max_bucket,
-			"HIGH MASK       ", hctl->high_mask,
-			"LOW  MASK       ", hctl->low_mask,
-			"NSEGS           ", hctl->nsegs);
+	elog(DEBUG4, "init_htab: Table Name: \"%s\"  Directory Size: %ld  Segment Size: %ld  Segment Shift: %d  Max Bucket: %u  High Mask: %x  Low Mask: %x  Number Segments: %ld",
+		 hashp->tabname, hctl->dsize, hctl->ssize, hctl->sshift, hctl->max_bucket,
+		 hctl->high_mask, hctl->low_mask, hctl->nsegs);
 #endif
 	return true;
 }
@@ -888,7 +877,7 @@ hash_destroy(HTAB *hashp)
 		/* so this hashtable must have its own context */
 		Assert(hashp->hcxt != NULL);
 
-		hash_stats("destroy", hashp);
+		hash_stats(__func__, hashp);
 
 		/*
 		 * Free everything by destroying the hash table's memory context.
@@ -898,19 +887,16 @@ hash_destroy(HTAB *hashp)
 }
 
 void
-hash_stats(const char *where, HTAB *hashp)
+hash_stats(const char *caller, HTAB *hashp)
 {
 #ifdef HASH_STATISTICS
-	fprintf(stderr, "%s: this HTAB -- accesses %ld collisions %ld\n",
-			where, hashp->hctl->accesses, hashp->hctl->collisions);
-
-	fprintf(stderr, "hash_stats: entries %ld keysize %ld maxp %u segmentcount %ld\n",
-			hash_get_num_entries(hashp), (long) hashp->hctl->keysize,
-			hashp->hctl->max_bucket, hashp->hctl->nsegs);
-	fprintf(stderr, "%s: total accesses %ld total collisions %ld\n",
-			where, hash_accesses, hash_collisions);
-	fprintf(stderr, "hash_stats: total expansions %ld\n",
-			hash_expansions);
+	HASHHDR    *hctl = hashp->hctl;
+
+	elog(DEBUG4,
+		 "hash_stats:  Caller: %s  Table Name: \"%s\"  Accesses: " UINT64_FORMAT "  Collisions: " UINT64_FORMAT "  Expansions: " UINT64_FORMAT "  Entries: %ld  Key Size: %zu  Max Bucket: %u  Segment Count: %ld",
+		 caller != NULL ? caller : "(unknown)", hashp->tabname, hctl->accesses,
+		 hctl->collisions, hctl->expansions, hash_get_num_entries(hashp),
+		 hctl->keysize, hctl->max_bucket, hctl->nsegs);
 #endif
 }
 
@@ -996,7 +982,6 @@ hash_search_with_hash_value(HTAB *hashp,
 	HashCompareFunc match;
 
 #ifdef HASH_STATISTICS
-	hash_accesses++;
 	hctl->accesses++;
 #endif
 
@@ -1040,7 +1025,6 @@ hash_search_with_hash_value(HTAB *hashp,
 		prevBucketPtr = &(currBucket->link);
 		currBucket = *prevBucketPtr;
 #ifdef HASH_STATISTICS
-		hash_collisions++;
 		hctl->collisions++;
 #endif
 	}
@@ -1176,7 +1160,6 @@ hash_update_hash_key(HTAB *hashp,
 #ifdef HASH_STATISTICS
 	HASHHDR    *hctl = hashp->hctl;
 
-	hash_accesses++;
 	hctl->accesses++;
 #endif
 
@@ -1230,7 +1213,6 @@ hash_update_hash_key(HTAB *hashp,
 		prevBucketPtr = &(currBucket->link);
 		currBucket = *prevBucketPtr;
 #ifdef HASH_STATISTICS
-		hash_collisions++;
 		hctl->collisions++;
 #endif
 	}
@@ -1586,7 +1568,7 @@ expand_table(HTAB *hashp)
 	Assert(!IS_PARTITIONED(hctl));
 
 #ifdef HASH_STATISTICS
-	hash_expansions++;
+	hctl->expansions++;
 #endif
 
 	new_bucket = hctl->max_bucket + 1;
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 932cc4f34d9..80deb8e543e 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -132,7 +132,7 @@ typedef struct
 extern HTAB *hash_create(const char *tabname, long nelem,
 						 const HASHCTL *info, int flags);
 extern void hash_destroy(HTAB *hashp);
-extern void hash_stats(const char *where, HTAB *hashp);
+extern void hash_stats(const char *caller, HTAB *hashp);
 extern void *hash_search(HTAB *hashp, const void *keyPtr, HASHACTION action,
 						 bool *foundPtr);
 extern uint32 get_hash_value(HTAB *hashp, const void *keyPtr);
#18Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#17)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Sun, Aug 17, 2025 at 12:24:29AM +1200, David Rowley wrote:

I noticed that the existing code has a set of global variables that
keeps track of accesses, collisions and expansions for *all* tables in
the backend. This didn't fit well with the single line elog. I kinda
though the global information was a bit strange, so I just got rid of
it. There was no "expansions" field to track the number of expansions
for a single table, so I added one. I made all these uint64. I felt
long (which can be 32-bit on some platforms) was too small for
tracking the number of hash table accesses. Likely uint64 is too wide
to track the expansions. That likely could be made smaller, but since
these are not enabled by default, they're not taking up any struct
space in normal builds.

-#ifdef HASH_STATISTICS
-static long hash_accesses,
- hash_collisions,
- hash_expansions;
-#endif

These global counters are as old as d31084e9d111. Removing them
should not be a problem.

Using DEBUG4 looks sensible here for this purpose.

Side thing.. I'm wondering what prevents us from wiping out entirely
the use of long in this file. long is 8 bytes everywhere, except on
WIN32 where it's 4 bytes (as you say), which is a bad practice as we
have been bitten by overflows because of this dependency in the patch.
Not related to this patch, still seems worth cleaning up while looking
at this code. There are a few more things like HASHCTL, of course..
--
Michael

#19Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#18)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Sun, Aug 17, 2025 at 03:54:16PM +0900, Michael Paquier wrote:

Side thing.. I'm wondering what prevents us from wiping out entirely
the use of long in this file. long is 8 bytes everywhere, except on
WIN32 where it's 4 bytes (as you say), which is a bad practice as we
have been bitten by overflows because of this dependency in the patch.

... "this dependency in the past", not "patch".
--
Michael

#20David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#18)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Sun, 17 Aug 2025 at 18:54, Michael Paquier <michael@paquier.xyz> wrote:

-#ifdef HASH_STATISTICS
-static long hash_accesses,
- hash_collisions,
- hash_expansions;
-#endif

These global counters are as old as d31084e9d111. Removing them
should not be a problem.

Ancient!

Side thing.. I'm wondering what prevents us from wiping out entirely
the use of long in this file. long is 8 bytes everywhere, except on
WIN32 where it's 4 bytes (as you say), which is a bad practice as we
have been bitten by overflows because of this dependency in the patch.
Not related to this patch, still seems worth cleaning up while looking
at this code. There are a few more things like HASHCTL, of course..

No objections here. I think we're generally chipping away at that
problem anyway. At least, I did some of that recently, and I recall
Tom adjusted things to allow windows to have > 2GB work_mem, which was
a restriction imposed by 32-bit longs.

One last thing, in order to inform people of breakages sooner than a
post-commit report from the buildfarm, I wondered is if we should do:

-#ifdef HASH_DEBUG
+#if defined(HASH_DEBUG) || defined(USE_ASSERT_CHECKING)

The HASH_DEBUG does not add any extra fields, so the overhead only
amounts to the elog(DEBUG4) line. HASH_STATISTICS adds extra fields
and counter incrementing, so I don't propose the same treatment for
that.

David

#21Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#20)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, Aug 18, 2025 at 12:56:02PM +1200, David Rowley wrote:

One last thing, in order to inform people of breakages sooner than a
post-commit report from the buildfarm, I wondered is if we should do:

-#ifdef HASH_DEBUG
+#if defined(HASH_DEBUG) || defined(USE_ASSERT_CHECKING)

The HASH_DEBUG does not add any extra fields, so the overhead only
amounts to the elog(DEBUG4) line. HASH_STATISTICS adds extra fields
and counter incrementing, so I don't propose the same treatment for
that.

If we do that, I guess that we could just remove HASH_DEBUG, keeping
only HASH_STATISTICS.
--
Michael

#22David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#21)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, 18 Aug 2025 at 13:10, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Aug 18, 2025 at 12:56:02PM +1200, David Rowley wrote:

-#ifdef HASH_DEBUG
+#if defined(HASH_DEBUG) || defined(USE_ASSERT_CHECKING)

The HASH_DEBUG does not add any extra fields, so the overhead only
amounts to the elog(DEBUG4) line. HASH_STATISTICS adds extra fields
and counter incrementing, so I don't propose the same treatment for
that.

If we do that, I guess that we could just remove HASH_DEBUG, keeping
only HASH_STATISTICS.

I wondered about that and thought that there might be an above zero
chance that someone would want HASH_DEBUG without USE_ASSERT_CHECKING.
I don't really know if that person exists. It certainly isn't me.

David

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#22)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

David Rowley <dgrowleyml@gmail.com> writes:

On Mon, 18 Aug 2025 at 13:10, Michael Paquier <michael@paquier.xyz> wrote:

If we do that, I guess that we could just remove HASH_DEBUG, keeping
only HASH_STATISTICS.

I wondered about that and thought that there might be an above zero
chance that someone would want HASH_DEBUG without USE_ASSERT_CHECKING.
I don't really know if that person exists. It certainly isn't me.

Yeah, it's really quite unclear what the existing HASH_DEBUG printout
is good for. At least in our usage, it doesn't tell you anything
you can't discover from static code analysis. I'm +1 for just
dropping it altogether.

regards, tom lane

#24David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#23)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, 18 Aug 2025 at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I wondered about that and thought that there might be an above zero
chance that someone would want HASH_DEBUG without USE_ASSERT_CHECKING.
I don't really know if that person exists. It certainly isn't me.

Yeah, it's really quite unclear what the existing HASH_DEBUG printout
is good for. At least in our usage, it doesn't tell you anything
you can't discover from static code analysis. I'm +1 for just
dropping it altogether.

I'm starting to lean more towards that myself. I had mostly just been
motivated to finding a way to prevent it from existing in a broken
state again.

HASH_STATISTICS I can imagine is more useful as that information isn't
otherwise recorded anywhere.

David

#25Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#24)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, Aug 18, 2025 at 02:19:06PM +1200, David Rowley wrote:

On Mon, 18 Aug 2025 at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it's really quite unclear what the existing HASH_DEBUG printout
is good for. At least in our usage, it doesn't tell you anything
you can't discover from static code analysis. I'm +1 for just
dropping it altogether.

I'm starting to lean more towards that myself. I had mostly just been
motivated to finding a way to prevent it from existing in a broken
state again.

+1.

HASH_STATISTICS I can imagine is more useful as that information isn't
otherwise recorded anywhere.

By the way, once we have reached a conclusion here, I'll go update one
of my animals to use what's remaining of the flags, so as this is
captured in the future.
--
Michael

#26David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#25)
2 attachment(s)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, 18 Aug 2025 at 17:06, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Aug 18, 2025 at 02:19:06PM +1200, David Rowley wrote:

On Mon, 18 Aug 2025 at 13:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it's really quite unclear what the existing HASH_DEBUG printout
is good for. At least in our usage, it doesn't tell you anything
you can't discover from static code analysis. I'm +1 for just
dropping it altogether.

I'm starting to lean more towards that myself. I had mostly just been
motivated to finding a way to prevent it from existing in a broken
state again.

+1.

Ok. I've attached 2 patches. 0001 adjusted the HASH_STATISTICS output
to use DEBUG4 and 0002 removes HASH_DEBUG.

I've purposefully left references to HASH_DEBUG in the "Original
comments" section near the top of dynahash.c. That comment also
references function names that no longer exist.

HASH_STATISTICS I can imagine is more useful as that information isn't
otherwise recorded anywhere.

By the way, once we have reached a conclusion here, I'll go update one
of my animals to use what's remaining of the flags, so as this is
captured in the future.

Sounds good. Thanks.

David

Attachments:

v2-0001-Use-elog-DEBUG4-for-dynahash.c-statistics-output.patchapplication/octet-stream; name=v2-0001-Use-elog-DEBUG4-for-dynahash.c-statistics-output.patchDownload
From 16875c7c32a7e20cddf1e18c918be418f8a88046 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 18 Aug 2025 12:32:02 +1200
Subject: [PATCH v2 1/2] Use elog(DEBUG4) for dynahash.c statistics output

Previously this was being output to stderr.  This commit adjusts things
to use elog(DEBUG4).  Here we also adjust the format of the message to
add the hash table name and also put it on a single line.  This should
make grepping the logs for this information easier.

Also get rid of the global hash table statistics.  This seems very dated
and didn't fit very well with trying to put all the statistics for a
specific hash table on a single log line.

The main aim here is to allow it so we can have at least one buildfarm
member build with HASH_STATISTICS to prevent future changes from breaking
things in that area.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvoccvJ9CG5zx+i-EyCzJbcL5K=CzqrnL_YN59qaL5hiaw@mail.gmail.com
---
 src/backend/utils/hash/dynahash.c | 40 +++++++++++--------------------
 src/include/utils/hsearch.h       |  2 +-
 2 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index d0843551007..2b22c65e541 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -205,8 +205,9 @@ struct HASHHDR
 	 * Count statistics here.  NB: stats code doesn't bother with mutex, so
 	 * counts could be corrupted a bit in a partitioned table.
 	 */
-	long		accesses;
-	long		collisions;
+	uint64		accesses;
+	uint64		collisions;
+	uint64		expansions;
 #endif
 };
 
@@ -266,12 +267,6 @@ struct HTAB
  */
 #define MOD(x,y)			   ((x) & ((y)-1))
 
-#ifdef HASH_STATISTICS
-static long hash_accesses,
-			hash_collisions,
-			hash_expansions;
-#endif
-
 /*
  * Private function prototypes
  */
@@ -661,7 +656,7 @@ hdefault(HTAB *hashp)
 	hctl->isfixed = false;		/* can be enlarged */
 
 #ifdef HASH_STATISTICS
-	hctl->accesses = hctl->collisions = 0;
+	hctl->accesses = hctl->collisions = hctl->expansions = 0;
 #endif
 }
 
@@ -888,7 +883,7 @@ hash_destroy(HTAB *hashp)
 		/* so this hashtable must have its own context */
 		Assert(hashp->hcxt != NULL);
 
-		hash_stats("destroy", hashp);
+		hash_stats(__func__, hashp);
 
 		/*
 		 * Free everything by destroying the hash table's memory context.
@@ -898,19 +893,16 @@ hash_destroy(HTAB *hashp)
 }
 
 void
-hash_stats(const char *where, HTAB *hashp)
+hash_stats(const char *caller, HTAB *hashp)
 {
 #ifdef HASH_STATISTICS
-	fprintf(stderr, "%s: this HTAB -- accesses %ld collisions %ld\n",
-			where, hashp->hctl->accesses, hashp->hctl->collisions);
-
-	fprintf(stderr, "hash_stats: entries %ld keysize %ld maxp %u segmentcount %ld\n",
-			hash_get_num_entries(hashp), (long) hashp->hctl->keysize,
-			hashp->hctl->max_bucket, hashp->hctl->nsegs);
-	fprintf(stderr, "%s: total accesses %ld total collisions %ld\n",
-			where, hash_accesses, hash_collisions);
-	fprintf(stderr, "hash_stats: total expansions %ld\n",
-			hash_expansions);
+	HASHHDR    *hctl = hashp->hctl;
+
+	elog(DEBUG4,
+		 "hash_stats:  Caller: %s  Table Name: \"%s\"  Accesses: " UINT64_FORMAT "  Collisions: " UINT64_FORMAT "  Expansions: " UINT64_FORMAT "  Entries: %ld  Key Size: %zu  Max Bucket: %u  Segment Count: %ld",
+		 caller != NULL ? caller : "(unknown)", hashp->tabname, hctl->accesses,
+		 hctl->collisions, hctl->expansions, hash_get_num_entries(hashp),
+		 hctl->keysize, hctl->max_bucket, hctl->nsegs);
 #endif
 }
 
@@ -996,7 +988,6 @@ hash_search_with_hash_value(HTAB *hashp,
 	HashCompareFunc match;
 
 #ifdef HASH_STATISTICS
-	hash_accesses++;
 	hctl->accesses++;
 #endif
 
@@ -1040,7 +1031,6 @@ hash_search_with_hash_value(HTAB *hashp,
 		prevBucketPtr = &(currBucket->link);
 		currBucket = *prevBucketPtr;
 #ifdef HASH_STATISTICS
-		hash_collisions++;
 		hctl->collisions++;
 #endif
 	}
@@ -1176,7 +1166,6 @@ hash_update_hash_key(HTAB *hashp,
 #ifdef HASH_STATISTICS
 	HASHHDR    *hctl = hashp->hctl;
 
-	hash_accesses++;
 	hctl->accesses++;
 #endif
 
@@ -1230,7 +1219,6 @@ hash_update_hash_key(HTAB *hashp,
 		prevBucketPtr = &(currBucket->link);
 		currBucket = *prevBucketPtr;
 #ifdef HASH_STATISTICS
-		hash_collisions++;
 		hctl->collisions++;
 #endif
 	}
@@ -1586,7 +1574,7 @@ expand_table(HTAB *hashp)
 	Assert(!IS_PARTITIONED(hctl));
 
 #ifdef HASH_STATISTICS
-	hash_expansions++;
+	hctl->expansions++;
 #endif
 
 	new_bucket = hctl->max_bucket + 1;
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 932cc4f34d9..80deb8e543e 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -132,7 +132,7 @@ typedef struct
 extern HTAB *hash_create(const char *tabname, long nelem,
 						 const HASHCTL *info, int flags);
 extern void hash_destroy(HTAB *hashp);
-extern void hash_stats(const char *where, HTAB *hashp);
+extern void hash_stats(const char *caller, HTAB *hashp);
 extern void *hash_search(HTAB *hashp, const void *keyPtr, HASHACTION action,
 						 bool *foundPtr);
 extern uint32 get_hash_value(HTAB *hashp, const void *keyPtr);
-- 
2.43.0

v2-0002-Remove-HASH_DEBUG-output-from-dynahash.c.patchapplication/octet-stream; name=v2-0002-Remove-HASH_DEBUG-output-from-dynahash.c.patchDownload
From 1f054b0e117df18222a518b4feeb9253b0a91b5d Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 18 Aug 2025 17:37:10 +1200
Subject: [PATCH v2 2/2] Remove HASH_DEBUG output from dynahash.c

This existed in a semi broken stated from be0a66666 until 296cba276.
Recent discussion has questioned the value of having this at all as it
only outputs static information from various of the hash table's
properties when the hash table is first created.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvoccvJ9CG5zx+i-EyCzJbcL5K=CzqrnL_YN59qaL5hiaw@mail.gmail.com
---
 src/backend/utils/hash/dynahash.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 2b22c65e541..f48fcf66def 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -770,17 +770,6 @@ init_htab(HTAB *hashp, long nelem)
 	/* Choose number of entries to allocate at a time */
 	hctl->nelem_alloc = choose_nelem_alloc(hctl->entrysize);
 
-#ifdef HASH_DEBUG
-	fprintf(stderr, "init_htab:\n%s%p\n%s%ld\n%s%ld\n%s%d\n%s%u\n%s%x\n%s%x\n%s%ld\n",
-			"TABLE POINTER   ", hashp,
-			"DIRECTORY SIZE  ", hctl->dsize,
-			"SEGMENT SIZE    ", hctl->ssize,
-			"SEGMENT SHIFT   ", hctl->sshift,
-			"MAX BUCKET      ", hctl->max_bucket,
-			"HIGH MASK       ", hctl->high_mask,
-			"LOW  MASK       ", hctl->low_mask,
-			"NSEGS           ", hctl->nsegs);
-#endif
 	return true;
 }
 
-- 
2.43.0

#27Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#26)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Mon, Aug 18, 2025 at 05:55:33PM +1200, David Rowley wrote:

I've purposefully left references to HASH_DEBUG in the "Original
comments" section near the top of dynahash.c. That comment also
references function names that no longer exist.

Ah, right, the hcreate() and hdestroy() business. LGTM.
--
Michael

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#26)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

David Rowley <dgrowleyml@gmail.com> writes:

I've purposefully left references to HASH_DEBUG in the "Original
comments" section near the top of dynahash.c. That comment also
references function names that no longer exist.

Hm, there is actually no part of that comment para that is accurate
anymore, so I cannot imagine a reason for leaving it alone. People
who want to do archaeology can consult the git history. How about
reducing the para to

* HASH_STATISTICS causes some usage statistics to be maintained,
* which can be logged by calling hash_stats().

LGTM otherwise.

regards, tom lane

#29David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#28)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Tue, 19 Aug 2025 at 01:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I've purposefully left references to HASH_DEBUG in the "Original
comments" section near the top of dynahash.c. That comment also
references function names that no longer exist.

Hm, there is actually no part of that comment para that is accurate
anymore, so I cannot imagine a reason for leaving it alone. People
who want to do archaeology can consult the git history. How about
reducing the para to

* HASH_STATISTICS causes some usage statistics to be maintained,
* which can be logged by calling hash_stats().

Ok. I pushed it like that. Thanks for looking.

David

#30Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#29)
Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options

On Tue, Aug 19, 2025 at 11:21:06AM +1200, David Rowley wrote:

Ok. I pushed it like that. Thanks for looking.

Added a -DHASH_STATISTICS to batta, that runs only the master branch.
--
Michael