Division in dynahash.c due to HASH_FFACTOR

Started by Jakub Wartakover 5 years ago12 messages

Jakub.Wartak@tomtom.com

over 5 years ago

Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I would like to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill factor is always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). Because the functionality is present there seems to be division for every buffer access [BufTableLookup()] / or every smgropen() call (everything call to hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). This division is especially visible via perf on single process StartupXLOG WAL recovery process on standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search:
0x0000000000888751 <+449>: idiv r8
0x0000000000888754 <+452>: cmp rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above

I've made a PoC test to skip that division assuming ffactor would be gone:
               if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-                       hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+                       hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles... Would removing it break some external API/extensions ? I saw several optimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide Or maybe there is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty often I have no idea what kind of pgbench stresstest could be used to demonstrate the gain (or lack of it).

-Jakub Wartak.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 5 years ago

In reply to: Jakub Wartak (#1)

Re: Division in dynahash.c due to HASH_FFACTOR

On Fri, Sep 04, 2020 at 07:01:41AM +0000, Jakub Wartak wrote:

Greetins hackers,

I have mixed feelings if this welcome contribution as the potential
gain is relatively small in my tests, but still I would like to point
out that HASH_FFACTOR functionality from dynahash.c could be removed or
optimized (default fill factor is always 1, there's not a single place
that uses custom custom fill factor other than DEF_FFACTOR=1 inside
PostgreSQL repository). Because the functionality is present there
seems to be division for every buffer access [BufTableLookup()] / or
every smgropen() call (everything call to hash_search() is affected,
provided it's not ShmemInitHash/HASH_PARTITION). This division is
especially visible via perf on single process StartupXLOG WAL recovery
process on standby in heavy duty 100% CPU conditions , as the top1 is
inside hash_search:

ï¿½ ï¿½0x0000000000888751 <+449>: ï¿½ idiv ï¿½ r8
ï¿½ ï¿½0x0000000000888754 <+452>: ï¿½ cmp ï¿½ ï¿½rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above
I've made a PoC test to skip that division assuming ffactor would be gone:
if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
- ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&
For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

Hmm, so it's the division that makes the difference? In that case,
wouldn't it make sense to just compute a threshold every time the hash
table is resized, and then do just the comparison. That is, compute

nentries_threshold = (long) (hctl->max_bucket + 1) * hctl->ffactor;

and then do just

hctl->freeList[0].nentries >= nentries_threshold

Of course, this assumes the threshold is calculated only rarely, maybe
that's not the case.

Also, what if you lower the fill factor, e.g. to 0.8? Does that improve
performance or not? I can't find any discussion about this in the
archives, but the dynamic hashing paper [1]https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf seems to suggest it does
make a difference (see the tables at the end, but I haven't re-read the
paper recently so maybe it's irrelevant). Anyway, it'd be foolish to
remove the ffactor parameter to save on division only to lose the
ability to lower the factor and save more than that ...

As for the 4% - it's not much, but it's also not negligible. I'm sure
we've done micro-optimizations for smaller gains. The big question is
whether this is statistically significant, or whether it's just due to
random effects. It could easily be caused by changes in layout of the
binary, for example - that can happen quite easily. See [2]https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf and [3]https://www.youtube.com/watch?v=r-TLSBdHe1A.

The talk actually mentions a tool meant to eliminate this bias by
randomization, but I've never tried using it on PostgreSQL so I'm not
sure how compatible it is :-(

[1]: https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf
[2]: https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf
[3]: https://www.youtube.com/watch?v=r-TLSBdHe1A

After removing HASH_FFACTOR PostgreSQL still compiles... ï¿½Would
removing it break some external API/extensions ? I saw several
optimization for the "idiv" where it could be optimized e.g. see
https://github.com/ridiculousfish/libdivide Or maybe there is some
other idea to expose bottlenecks of BufTableLookup() ? I also saw
codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could
be called pretty often I have no idea what kind of pgbench stresstest
could be used to demonstrate the gain (or lack of it).

-Jakub Wartak.

I don't think whit would break a lot of stuff, but I'm kinda dubious
it's a measurable improvement for common workloads ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Alvaro Herrera

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Jakub Wartak (#1)

Re: Division in dynahash.c due to HASH_FFACTOR

On 2020-Sep-04, Jakub Wartak wrote:

After removing HASH_FFACTOR PostgreSQL still compiles... ï¿½Would
removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

/messages/by-id/20160418180711.55ac82c0@fujitsu already reported
that this flag was unused a few years ago. And a search through
codesearch.debian.net shows that no software packaged by Debian uses
that flag either.

*If* any third-party extension is using HASH_FFACTOR, it can easily be
unbroken by removing the flag or #defining it to zero; the removal would
only be a problem if anyone showed that there is a performance loss by
using the default fillfactor for some dynahash table instead of their
custom value.

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#3)

Re: Division in dynahash.c due to HASH_FFACTOR

On Sat, Sep 5, 2020 at 2:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Sep-04, Jakub Wartak wrote:

After removing HASH_FFACTOR PostgreSQL still compiles... Would
removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

/messages/by-id/20160418180711.55ac82c0@fujitsu already reported
that this flag was unused a few years ago. And a search through
codesearch.debian.net shows that no software packaged by Debian uses
that flag either.

*If* any third-party extension is using HASH_FFACTOR, it can easily be
unbroken by removing the flag or #defining it to zero; the removal would
only be a problem if anyone showed that there is a performance loss by
using the default fillfactor for some dynahash table instead of their
custom value.

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

I think Tomas is right, and the correct fix for this would be to
compute it up front every time the hash table size changes, but since
apparently no one ever used it, +1 for just removing it and leaving it
hard-coded.

For what it's worth, for dshash.c I just hard coded it to double the
hash table size whenever the number of elements in a partition
exceeded 75%. Popular hash tables in standard libraries seem to use
either .75 or 1.0 as a default or only setting. There's probably room
for discussion about numbers in those ranges, but I think the concept
of customisable load factors with a range much wider than that may be
more relevant for disk-based hash tables (like our hash indexes),
which have very different economics. I think the name HASH_FFACTOR
may be a clue that this was possibly interference from disk-based hash
tables from the same Berkeley people: rather than "load factor", the
more usual term (?) for nentries/nbuckets in memory-based hash tables
as a way to model number of expected collisions, they called it "fill
factor", which I guess is because in disk-based designs your actually
want a certain rate of collisions, because you're working not with
elements in an array and wanting to avoid collisions while not wasting
space, but rather with fixed sized buckets that you want to fill up,
but not overflow. </archeological-speculation-mode>

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

Sure, but removing the dynahash IDIV also seems like a slam dunk
removal of a useless expensive feature.

Jakub Wartak

Jakub.Wartak@tomtom.com

over 5 years ago

In reply to: Thomas Munro (#4)

Re: Division in dynahash.c due to HASH_FFACTOR

Hi Thomas, Tomas :), Alvaro, hackers,

After removing HASH_FFACTOR PostgreSQL still compiles... Would
removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

[..]

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

Thomas wrote:

I think Tomas is right, and the correct fix for this would be to
compute it up front every time the hash table size changes, but since
apparently no one ever used it, +1 for just removing it and leaving it
hard-coded.

[..]

For what it's worth, for dshash.c I just hard coded it to double the
hash table size whenever the number of elements in a partition
exceeded 75%. Popular hash tables in standard libraries seem to use
either .75 or 1.0 as a default or only setting. There's probably room
for discussion about numbers in those ranges, (..)

Tomas also asked the same "what if you lower the fill factor, e.g. to 0.8? Does that improve performance or not?" and got some interesting links for avoiding bias. I couldn't find any other simple way to benchmark a real impact of it other than checking DB crash-recovery (if anyone has some artificial workload generator with PinBuffer->hash_search() please let me know).

So I've been testing it using WAL crash-recovery using Thomas's approach [1]https://github.com/macdice/redo-bench: always the same workload to reply, always on the same machine, same env: same 13GB WAL to recover, 8GB db, 24GB shared buffers on top of 128GB RAM, NVMe with fsync off to put dynahash as primary bottleneck (BufTableLookup()+smgropen()), append-only workload, gcc -O3: Results are <1st time is the WAL recovery timing, 2nd time is checkpoint before opening DB>. There were four measurements per each scenario, first one is cut off to avoid noise:

1. DEF_FFACTOR=1 idiv on
103.971, 3.719
103.937, 3.683
104.934, 3.691

2. DEF_FFACTOR=1 idiv off
100.565, 3.71
101.595, 3.807
100.794, 3.691

3. DEF_FFACTOR=1.2 idiv on // some proof that nobody uses it
1599552729.041 postmaster 94510 FATAL: failed to initialize hash table "SERIALIZABLEXID hash"

4. DEF_FFACTOR=0.8 idiv on // another proof ? The reason for below is that ffactor is long and as such cannot handle floats, if it would be float....

Program terminated with signal 8, Arithmetic exception.
#0 0x00000000009a4119 in init_htab (nelem=4, hashp=0x17cd238) at dynahash.c:677
677 nbuckets = next_pow2_int((nelem - 1) / hctl->ffactor + 1);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.180.amzn1.x86_64

#0 0x00000000009a4119 in init_htab (nelem=4, hashp=0x17cd238) at dynahash.c:677
#1 hash_create (tabname=tabname@entry=0xb77f9a "Timezones", nelem=nelem@entry=4, info=info@entry=0x7ffd3055cf30, flags=flags@entry=16) at dynahash.c:529
#2 0x00000000009ecddf in init_timezone_hashtable () at pgtz.c:211
#3 pg_tzset (name=name@entry=0xb49fd5 "GMT") at pgtz.c:248
#4 0x00000000009ecfee in pg_timezone_initialize () at pgtz.c:372

5. DEF_FFACTOR=1 idiv on // just in case reproducer to validate measurement #1
104.333, 3.804
104.313, 3.772
103.951, 3.687

6. DEF_FFACTOR=2 idiv on
105.406, 3.783
105.33, 3.721
105.403, 3.777

7. DEF_FFACTOR=4 idiv on
105.803, 3.722
105.73, 3.728
106.104, 3.756

Some notes:
- SMgrRelationHash is initialized from start at 400, while DB here is small 8GB and has only couple of tables -> no expansion needed in above test.
- local backend private overflow hash table for buffers: PrivateRefCountHash is initialized at 100 and maybe it grows during
- googling for DEF_FFACTOR I've found only 1 entry with value of <> 1 from 1993 (see this [2]https://fuhrwerks.com/csrg/info/93c40a660b6cdf74 , what's interesting is the same DEF_SEGSIZE value after all those years)

Alvaro wrote:

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

Thomas wrote:

Sure, but removing the dynahash IDIV also seems like a slam dunk removal of a useless expensive feature.

I agree with both, I just thought it might be interesting finding as this idiv might be (?) present in other common paths like ReadBuffer*() / PinBuffer() (some recent discussions, maybe on NUMA boxes), not just WAL recovery as it seems relatively easy to improve.

-J.

[1]: https://github.com/macdice/redo-bench
[2]: https://fuhrwerks.com/csrg/info/93c40a660b6cdf74

________________________________
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Tuesday, September 8, 2020 2:55 AM
To: Alvaro Herrera <alvherre@2ndquadrant.com>
Cc: Jakub Wartak <Jakub.Wartak@tomtom.com>; pgsql-hackers <pgsql-hackers@postgresql.org>
Subject: Re: Division in dynahash.c due to HASH_FFACTOR

On Sat, Sep 5, 2020 at 2:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Sep-04, Jakub Wartak wrote:

After removing HASH_FFACTOR PostgreSQL still compiles... Would
removing it break some external API/extensions ?

FWIW, HASH_FFACTOR has *never* been used in Postgres core code.

/messages/by-id/20160418180711.55ac82c0@fujitsu already reported
that this flag was unused a few years ago. And a search through
codesearch.debian.net shows that no software packaged by Debian uses
that flag either.

*If* any third-party extension is using HASH_FFACTOR, it can easily be
unbroken by removing the flag or #defining it to zero; the removal would
only be a problem if anyone showed that there is a performance loss by
using the default fillfactor for some dynahash table instead of their
custom value.

I think if we know that there's a 4% performance increase by removing
the division that comes with an ability to use a custom fillfactor that
nobody uses or has ever used, it makes sense to remove that ability.

... however, if we're really tense about improving some hash table's
performance, it might be worth trying to use simplehash.h instead of
dynahash.c for it.

Sure, but removing the dynahash IDIV also seems like a slam dunk
removal of a useless expensive feature.

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Jakub Wartak (#5)

1 attachment(s)

Re: Division in dynahash.c due to HASH_FFACTOR

On Tue, Sep 8, 2020 at 11:17 PM Jakub Wartak <Jakub.Wartak@tomtom.com> wrote:

I agree with both, I just thought it might be interesting finding as this idiv might be (?) present in other common paths like ReadBuffer*() / PinBuffer() (some recent discussions, maybe on NUMA boxes), not just WAL recovery as it seems relatively easy to improve.

I wrote a draft commit message for Jakub's proposed change (attached),
and did a little bit of testing, but I haven't seen a case where it
wins yet; I need to work on something else now but thought I'd share
this much anyway. One observation is that the rule in init_htab had
better agree with the expansion rule in hash_search_with_hash_value,
otherwise you can size your hash table perfectly for n elements and
then it still has to expand when you insert the nth element, which is
why I changed >= to >. Does this make sense? Oh man, I don't like
the mash-up of int, long, uint32, Size dynahash uses in various places
and that are brought into relief by that comparison...

Attachments:

0001-Remove-custom-fill-factor-support-from-dynahash.c.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-custom-fill-factor-support-from-dynahash.c.patchDownload

From e5d890bc1178861dc1a5b1f430289a5ee2b4a2fa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 10 Sep 2020 12:27:25 +1200
Subject: [PATCH] Remove custom fill factor support from dynahash.c.

Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:

1.  It had to be an integer value >= 1, whereas for in memory hash
tables interesting load factor targets are probably somewhere near the
0.75-1.0 range.

2.  It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.

3.  We are not aware of anyone ever having used a non-default value.

Therefore, remove support, making fill factor 1 as the implicit value.

Author: Jakub Wartak <Jakub.Wartak@tomtom.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
---
 src/backend/utils/hash/dynahash.c | 15 ++++-----------
 src/include/utils/hsearch.h       |  2 --
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index f4fbccdd7e..4aed0114fb 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -191,7 +191,6 @@ struct HASHHDR
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	long		num_partitions; /* # partitions (must be power of 2), or 0 */
-	long		ffactor;		/* target fill factor */
 	long		max_dsize;		/* 'dsize' limit if directory is fixed size */
 	long		ssize;			/* segment size --- must be power of 2 */
 	int			sshift;			/* segment shift = log2(ssize) */
@@ -497,8 +496,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		/* ssize had better be a power of 2 */
 		Assert(hctl->ssize == (1L << hctl->sshift));
 	}
-	if (flags & HASH_FFACTOR)
-		hctl->ffactor = info->ffactor;
 
 	/*
 	 * SHM hash tables have fixed directory size passed by the caller.
@@ -603,8 +600,6 @@ hdefault(HTAB *hashp)
 
 	hctl->num_partitions = 0;	/* not partitioned */
 
-	hctl->ffactor = DEF_FFACTOR;
-
 	/* table has no fixed maximum size */
 	hctl->max_dsize = NO_MAX_DSIZE;
 
@@ -670,11 +665,10 @@ init_htab(HTAB *hashp, long nelem)
 			SpinLockInit(&(hctl->freeList[i].mutex));
 
 	/*
-	 * Divide number of elements by the fill factor to determine a desired
-	 * number of buckets.  Allocate space for the next greater power of two
-	 * number of buckets
+	 * Allocate space for the next greater power of two number of buckets,
+	 * assuming a desired maximum load factor of 1.
 	 */
-	nbuckets = next_pow2_int((nelem - 1) / hctl->ffactor + 1);
+	nbuckets = next_pow2_int(nelem);
 
 	/*
 	 * In a partitioned table, nbuckets must be at least equal to
@@ -733,7 +727,6 @@ init_htab(HTAB *hashp, long nelem)
 			"DIRECTORY SIZE  ", hctl->dsize,
 			"SEGMENT SIZE    ", hctl->ssize,
 			"SEGMENT SHIFT   ", hctl->sshift,
-			"FILL FACTOR     ", hctl->ffactor,
 			"MAX BUCKET      ", hctl->max_bucket,
 			"HIGH MASK       ", hctl->high_mask,
 			"LOW  MASK       ", hctl->low_mask,
@@ -975,7 +968,7 @@ hash_search_with_hash_value(HTAB *hashp,
 		 * order of these tests is to try to check cheaper conditions first.
 		 */
 		if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-			hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+			hctl->freeList[0].nentries > (long) (hctl->max_bucket + 1) &&
 			!has_seq_scans(hashp))
 			(void) expand_table(hashp);
 	}
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index f1deb9beab..bebf89b3c4 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -68,7 +68,6 @@ typedef struct HASHCTL
 	long		ssize;			/* segment size */
 	long		dsize;			/* (initial) directory size */
 	long		max_dsize;		/* limit to dsize if dir size is limited */
-	long		ffactor;		/* fill factor */
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	HashValueFunc hash;			/* hash function */
@@ -83,7 +82,6 @@ typedef struct HASHCTL
 #define HASH_PARTITION	0x0001	/* Hashtable is used w/partitioned locking */
 #define HASH_SEGMENT	0x0002	/* Set segment size */
 #define HASH_DIRSIZE	0x0004	/* Set directory size (initial and max) */
-#define HASH_FFACTOR	0x0008	/* Set fill factor */
 #define HASH_ELEM		0x0010	/* Set keysize and entrysize */
 #define HASH_BLOBS		0x0020	/* Select support functions for binary keys */
 #define HASH_FUNCTION	0x0040	/* Set user defined hash function */
-- 
2.20.1

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Thomas Munro (#6)

Re: Division in dynahash.c due to HASH_FFACTOR

On Thu, 10 Sep 2020 at 14:55, Thomas Munro <thomas.munro@gmail.com> wrote:

I wrote a draft commit message for Jakub's proposed change (attached),
and did a little bit of testing, but I haven't seen a case where it
wins yet; I need to work on something else now but thought I'd share
this much anyway. One observation is that the rule in init_htab had
better agree with the expansion rule in hash_search_with_hash_value,
otherwise you can size your hash table perfectly for n elements and
then it still has to expand when you insert the nth element, which is
why I changed >= to >. Does this make sense? Oh man, I don't like
the mash-up of int, long, uint32, Size dynahash uses in various places
and that are brought into relief by that comparison...

I just did some benchmarking with this patch using the same recovery
benchmark that I used in [1]/messages/by-id/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q@mail.gmail.com and also the two patches that I posted in
[2]: /messages/by-id/CAApHDvo9n-nOb3b4PYFx+w9mqd2SSUHm_oAs039eZnZLqFGcbQ@mail.gmail.com
could repeat the recovery over and over again with the same WAL.

Without 0001-Remove-custom-fill-factor-support-from-dynahash.c.patch,
the time in seconds reported for recovery was as follows:

Run 1 65.2
Run 2 65.41
Run 3 65.1
Run 4 64.86
Run 5 62.29
Run 6 67.06
Run 7 63.35
Run 8 63.1
Run 9 62.8
Run 10 62.15

Avg 64.132
Med 64.105

After patching I got:

Run 1 60.69
Run 2 63.13
Run 3 63.24
Run 4 62.13
Run 5 63.81
Run 6 60.27
Run 7 62.85
Run 8 63.45
Run 9 59.6
Run 10 63.16

Avg 62.233
Med 62.99

I was quite happy to see 59.6 seconds in there on run 9 as I'd been
trying hard to get that benchmark to go below 1 minute on my machine.

perf appears to indicate that less time was spent in
hash_search_with_hash_value()

Unpatched:
26.96% postgres postgres [.] PageRepairFragmentation
12.07% postgres libc-2.31.so [.] __memmove_avx_unaligned_erms
10.13% postgres postgres [.] hash_search_with_hash_value
6.70% postgres postgres [.] XLogReadBufferExtended

Patched:
27.70% postgres postgres [.] PageRepairFragmentation
13.50% postgres libc-2.31.so [.] __memmove_avx_unaligned_erms
9.63% postgres postgres [.] hash_search_with_hash_value
6.44% postgres postgres [.] XLogReadBufferExtended

I looked over the patch and the only thing I saw was that we might
also want to remove the following line:

#define DEF_FFACTOR 1 /* default fill factor */

Also, a couple of things I noticed when looking at performance
profiles in detail.

1) The most expensive user of hash_search_with_hash_value() comes from
BufTableLookup() which passes HASH_FIND as the HASHACTION.
2) The code that was doing the divide in the unpatched version only
triggered when HASHACTION was HASH_ENTER or HASH_ENTER_NULL.

The 2nd most costly call to hash_search_with_hash_value() came in via
hash_search() via smgropen(). That does use HASH_ENTER, which could
have triggered the divide code. The main caller of smgropen() was
XLogReadBufferExtended().

So, it looks unlikely that any gains we are seeing are from improved
buffer lookups. It's more likely they're coming from more optimal
XLogReadBufferExtended()

David

[1]: /messages/by-id/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q@mail.gmail.com
[2]: /messages/by-id/CAApHDvo9n-nOb3b4PYFx+w9mqd2SSUHm_oAs039eZnZLqFGcbQ@mail.gmail.com

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: David Rowley (#7)

1 attachment(s)

Re: Division in dynahash.c due to HASH_FFACTOR

On Mon, Sep 14, 2020 at 11:35 PM David Rowley <dgrowleyml@gmail.com> wrote:

I just did some benchmarking with this patch using the same recovery
benchmark that I used in [1] and also the two patches that I posted in
[2]. Additionally, I added a PANIC at the end of recovery so that I
could repeat the recovery over and over again with the same WAL.

[data]

    N           Min           Max        Median           Avg        Stddev
x  10         62.15         67.06         64.86        64.132     1.6188528
+  10          59.6         63.81         63.13        62.233     1.4983031
Difference at 95.0% confidence
    -1.899 +/- 1.46553
    -2.96108% +/- 2.28517%
    (Student's t, pooled s = 1.55974)

Thanks! Hmm, small but apparently significant and in line with
Jakub's report, and I suppose the effect will be greater with other
nearby recovery performance patches applied that halve the times.
Annoyingly, I can't reproduce this speedup on my local i9-9900; maybe
it requires a different CPU...

I looked over the patch and the only thing I saw was that we might
also want to remove the following line:

#define DEF_FFACTOR 1 /* default fill factor */

Right, thanks. Fixed in the attached.

The 2nd most costly call to hash_search_with_hash_value() came in via
hash_search() via smgropen(). That does use HASH_ENTER, which could
have triggered the divide code. The main caller of smgropen() was
XLogReadBufferExtended().

So, it looks unlikely that any gains we are seeing are from improved
buffer lookups. It's more likely they're coming from more optimal
XLogReadBufferExtended()

I think we call smgropen() twice for every buffer referenced in the
WAL: XLogReadBufferExtended() and again in
ReadBufferWithoutRelcache(). We could reduce it to once with some
refactoring, but I am looking into whether I can reduce it to zero as
a side-effect of another change, more soon...

Attachments:

v2-0001-Remove-custom-fill-factor-support-from-dynahash.c.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-custom-fill-factor-support-from-dynahash.c.patchDownload

From efecf68b159a3c65517e91076009cb4e5cc6f157 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 10 Sep 2020 12:27:25 +1200
Subject: [PATCH v2] Remove custom fill factor support from dynahash.c.

Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:

1.  It had to be an integer value >= 1, whereas for in memory hash
tables interesting load factor targets are probably somewhere near the
0.75-1.0 range.

2.  It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.

3.  We are not aware of anyone ever having used a non-default value.

Therefore, remove support, making fill factor 1 as the implicit value.

Author: Jakub Wartak <Jakub.Wartak@tomtom.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
---
 src/backend/utils/hash/dynahash.c | 20 ++++++--------------
 src/include/utils/hsearch.h       |  2 --
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index f4fbccdd7e..1122e2e5e5 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -122,7 +122,6 @@
 #define DEF_SEGSIZE			   256
 #define DEF_SEGSIZE_SHIFT	   8	/* must be log2(DEF_SEGSIZE) */
 #define DEF_DIRSIZE			   256
-#define DEF_FFACTOR			   1	/* default fill factor */
 
 /* Number of freelists to be used for a partitioned hash table. */
 #define NUM_FREELISTS			32
@@ -191,7 +190,6 @@ struct HASHHDR
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	long		num_partitions; /* # partitions (must be power of 2), or 0 */
-	long		ffactor;		/* target fill factor */
 	long		max_dsize;		/* 'dsize' limit if directory is fixed size */
 	long		ssize;			/* segment size --- must be power of 2 */
 	int			sshift;			/* segment shift = log2(ssize) */
@@ -497,8 +495,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		/* ssize had better be a power of 2 */
 		Assert(hctl->ssize == (1L << hctl->sshift));
 	}
-	if (flags & HASH_FFACTOR)
-		hctl->ffactor = info->ffactor;
 
 	/*
 	 * SHM hash tables have fixed directory size passed by the caller.
@@ -603,8 +599,6 @@ hdefault(HTAB *hashp)
 
 	hctl->num_partitions = 0;	/* not partitioned */
 
-	hctl->ffactor = DEF_FFACTOR;
-
 	/* table has no fixed maximum size */
 	hctl->max_dsize = NO_MAX_DSIZE;
 
@@ -670,11 +664,10 @@ init_htab(HTAB *hashp, long nelem)
 			SpinLockInit(&(hctl->freeList[i].mutex));
 
 	/*
-	 * Divide number of elements by the fill factor to determine a desired
-	 * number of buckets.  Allocate space for the next greater power of two
-	 * number of buckets
+	 * Allocate space for the next greater power of two number of buckets,
+	 * assuming a desired maximum load factor of 1.
 	 */
-	nbuckets = next_pow2_int((nelem - 1) / hctl->ffactor + 1);
+	nbuckets = next_pow2_int(nelem);
 
 	/*
 	 * In a partitioned table, nbuckets must be at least equal to
@@ -733,7 +726,6 @@ init_htab(HTAB *hashp, long nelem)
 			"DIRECTORY SIZE  ", hctl->dsize,
 			"SEGMENT SIZE    ", hctl->ssize,
 			"SEGMENT SHIFT   ", hctl->sshift,
-			"FILL FACTOR     ", hctl->ffactor,
 			"MAX BUCKET      ", hctl->max_bucket,
 			"HIGH MASK       ", hctl->high_mask,
 			"LOW  MASK       ", hctl->low_mask,
@@ -761,7 +753,7 @@ hash_estimate_size(long num_entries, Size entrysize)
 				elementAllocCnt;
 
 	/* estimate number of buckets wanted */
-	nBuckets = next_pow2_long((num_entries - 1) / DEF_FFACTOR + 1);
+	nBuckets = next_pow2_long(num_entries);
 	/* # of segments needed for nBuckets */
 	nSegments = next_pow2_long((nBuckets - 1) / DEF_SEGSIZE + 1);
 	/* directory entries */
@@ -804,7 +796,7 @@ hash_select_dirsize(long num_entries)
 				nDirEntries;
 
 	/* estimate number of buckets wanted */
-	nBuckets = next_pow2_long((num_entries - 1) / DEF_FFACTOR + 1);
+	nBuckets = next_pow2_long(num_entries);
 	/* # of segments needed for nBuckets */
 	nSegments = next_pow2_long((nBuckets - 1) / DEF_SEGSIZE + 1);
 	/* directory entries */
@@ -975,7 +967,7 @@ hash_search_with_hash_value(HTAB *hashp,
 		 * order of these tests is to try to check cheaper conditions first.
 		 */
 		if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-			hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+			hctl->freeList[0].nentries > (long) (hctl->max_bucket + 1) &&
 			!has_seq_scans(hashp))
 			(void) expand_table(hashp);
 	}
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index f1deb9beab..bebf89b3c4 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -68,7 +68,6 @@ typedef struct HASHCTL
 	long		ssize;			/* segment size */
 	long		dsize;			/* (initial) directory size */
 	long		max_dsize;		/* limit to dsize if dir size is limited */
-	long		ffactor;		/* fill factor */
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	HashValueFunc hash;			/* hash function */
@@ -83,7 +82,6 @@ typedef struct HASHCTL
 #define HASH_PARTITION	0x0001	/* Hashtable is used w/partitioned locking */
 #define HASH_SEGMENT	0x0002	/* Set segment size */
 #define HASH_DIRSIZE	0x0004	/* Set directory size (initial and max) */
-#define HASH_FFACTOR	0x0008	/* Set fill factor */
 #define HASH_ELEM		0x0010	/* Set keysize and entrysize */
 #define HASH_BLOBS		0x0020	/* Select support functions for binary keys */
 #define HASH_FUNCTION	0x0040	/* Set user defined hash function */
-- 
2.20.1

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#8)

Re: Division in dynahash.c due to HASH_FFACTOR

On Tue, Sep 15, 2020 at 9:56 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I looked over the patch and the only thing I saw was that we might
also want to remove the following line:

#define DEF_FFACTOR 1 /* default fill factor */

Right, thanks. Fixed in the attached.

Pushed. Thanks Jakub, everyone.

Separately, we really should tidy up the int/long/uint32/size_t
confusion in this module. I know we have K&R C-era long-itude in
numerous other modules, but this one is a little more egregious in its
data type mixing.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Thomas Munro (#9)

Re: Division in dynahash.c due to HASH_FFACTOR

Thomas Munro <thomas.munro@gmail.com> writes:

Pushed. Thanks Jakub, everyone.

BTW, looking over this patch, I wonder about

/*
* Can't split if running in partitioned mode, nor if frozen, nor if
* table is the subject of any active hash_seq_search scans. Strange
* order of these tests is to try to check cheaper conditions first.
*/
if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
hctl->freeList[0].nentries > (long) (hctl->max_bucket + 1) &&
!has_seq_scans(hashp))
(void) expand_table(hashp);

ISTM that getting rid of the division obviates the concern that the
nentries condition is too expensive, and therefore we should revert
to checking it first, on the grounds that that condition is most
likely to fail.

regards, tom lane

#11

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Tom Lane (#10)

Re: Division in dynahash.c due to HASH_FFACTOR

I wrote:

ISTM that getting rid of the division obviates the concern that the
nentries condition is too expensive,

Also, we could make it slightly cheaper yet, by changing the condition
to

hctl->freeList[0].nentries > (long) (hctl->max_bucket)

ie drop the +1. I'd argue that this is actually a more faithful
rendition of the previous condition, since what we had before was
basically

hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1)

regards, tom lane

#12

Thomas Munro

thomas.munro@gmail.com

over 5 years ago

In reply to: Tom Lane (#11)

Re: Division in dynahash.c due to HASH_FFACTOR

On Sat, Sep 19, 2020 at 1:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

ISTM that getting rid of the division obviates the concern that the
nentries condition is too expensive,

True, that comment needed to go.

Also, we could make it slightly cheaper yet, by changing the condition
to

hctl->freeList[0].nentries > (long) (hctl->max_bucket)

ie drop the +1. I'd argue that this is actually a more faithful
rendition of the previous condition, since what we had before was
basically

hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1)

Agreed, and done. Thanks!