Comment simplehash/dynahash trade-offs
In another thread [1]/messages/by-id/CAAaqYe956a-zbm1qR8pqz=iLbF8LW5vBrQGrzXVHXdLk3at5_Q@mail.gmail.com I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."
I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.
v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.
v1-0002-Improve-simplehash-usage-notes.patch
I'd noticed while adding a simplehash implementation in that other
thread that the facts that:
- The element type requires a status member.
- Why storing a hash in the element type is useful (i.e., when to
define SH_STORE_HASH/SH_GET_HASH).
- The availability of private_data member for metadata access from callbacks.
are either undocumented or hard to discover, and so I've added the
information directly to the usage notes section.
v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.
James
[1]: /messages/by-id/CAAaqYe956a-zbm1qR8pqz=iLbF8LW5vBrQGrzXVHXdLk3at5_Q@mail.gmail.com
Attachments:
v1-0002-Improve-simplehash-usage-notes.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Improve-simplehash-usage-notes.patchDownload
From 36f6b670b30194efb4b7d0e622afc3085ecd49b6 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:39:04 -0400
Subject: [PATCH v1 2/3] Improve simplehash usage notes
---
src/include/lib/simplehash.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index e5101f2f1d..8daf7c4d1a 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -48,6 +48,19 @@
* - SH_STORE_HASH - if defined the hash is stored in the elements
* - SH_GET_HASH(tb, a) - return the field to store the hash in
*
+ * The element type is required to contain a "uint32 status" member.
+ *
+ * While SH_STORE_HASH (and subsequently SH_GET_HASH) are optional, because
+ * the hash table implementation needs to compare hashes to move elements
+ * (particularly when growing the hash), it's preferable, if possible, to
+ * store the element's hash in the element's data type. If the hash is so
+ * stored, the hash table will also compare hashes before calling SH_EQUAL
+ * when comparing two keys.
+ *
+ * For convenience the hash table create functions accept a void pointer
+ * will be stored in the hash table type's member private_data. This allows
+ * callbacks to reference caller provided data.
+ *
* For examples of usage look at tidbitmap.c (file local definition) and
* execnodes.h/execGrouping.c (exposed declaration, file local
* implementation).
--
2.17.1
v1-0003-Show-sample-simplehash-method-signatures.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Show-sample-simplehash-method-signatures.patchDownload
From 9116329e51419cb71f00d31f6819294cf2608e74 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:40:44 -0400
Subject: [PATCH v1 3/3] Show sample simplehash method signatures
---
src/include/lib/simplehash.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 8daf7c4d1a..d7f7134541 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -176,24 +176,47 @@ typedef struct SH_ITERATOR
/* externally visible function prototypes */
#ifdef SH_RAW_ALLOCATOR
+/* <prefix>_hash <prefix>_create(uint32 nelements, void *private_data) */
SH_SCOPE SH_TYPE *SH_CREATE(uint32 nelements, void *private_data);
#else
+/*
+ * <prefix>_hash <prefix>_create(MemoryContext ctx, uint32 nelements,
+ * void *private_data)
+ */
SH_SCOPE SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements,
void *private_data);
#endif
+/* void <prefix>_destroy(<prefix>_hash *tb) */
SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
+/* void <prefix>_reset(<prefix>_hash *tb) */
SH_SCOPE void SH_RESET(SH_TYPE * tb);
+/* void <prefix>_grow(<prefix>_hash *tb) */
SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize);
+/* <element> <prefix>_insert(<prefix>_hash *tb, <key> key, bool *found) */
SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
+/*
+ * <element> <prefix>_insert_hash(<prefix>_hash *tb, <key> key, uint32 hash,
+ * bool *found)
+ */
SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
uint32 hash, bool *found);
+/* <element> <prefix>_lookup(<prefix>_hash *tb, <key> key) */
SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
+/* <element> <prefix>_lookup_hash(<prefix>_hash *tb, <key> key, uint32 hash) */
SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
uint32 hash);
+/* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
+/* void <prefix>_start_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/*
+ * void <prefix>_start_iterate_at(<prefix>_hash *tb, <prefix>_iterator *iter,
+ * uint32 at)
+ */
SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
+/* <element> <prefix>_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/* void <prefix>_stat(<prefix>_hash *tb */
SH_SCOPE void SH_STAT(SH_TYPE * tb);
#endif /* SH_DECLARE */
--
2.17.1
v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patchDownload
From 6033d30015c7c217d6e4d79591cf81ae63e7e9e2 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:36:07 -0400
Subject: [PATCH v1 1/3] Summarize trade-offs between simplehash and dynahash
When reading the code it's not obvious when one should prefer dynahash
over simplehash and vice-versa, so, for programmer-friendliness,
add comments to inform that decision.
---
src/backend/utils/hash/dynahash.c | 10 +++++++++-
src/include/lib/simplehash.h | 22 ++++++++++++++++++----
2 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 5b4b9e487f..fc22b76223 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1,7 +1,7 @@
/*-------------------------------------------------------------------------
*
* dynahash.c
- * dynamic hash tables
+ * dynamic chained hash tables
*
* dynahash.c supports both local-to-a-backend hash tables and hash tables in
* shared memory. For shared hash tables, it is the caller's responsibility
@@ -41,6 +41,14 @@
* function must be supplied; comparison defaults to memcmp() and key copying
* to memcpy() when a user-defined hashing function is selected.
*
+ * Compared to simplehash, dynahash has the following benefits:
+ *
+ * - It supports partitioning, which is useful for shared memory access using
+ * locks.
+ * - Because entries don't need to be moved in the case of hash conflicts, has
+ * better performance for large entries
+ * - Guarantees stable pointers to entries.
+ *
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index f7af921f5a..e5101f2f1d 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -1,10 +1,24 @@
/*
* simplehash.h
*
- * Hash table implementation which will be specialized to user-defined
- * types, by including this file to generate the required code. It's
- * probably not worthwhile to do so for hash tables that aren't performance
- * or space sensitive.
+ * When included this file generates a "templated" (by way of macros)
+ * open-addressing hash table implementation specialized to user-defined
+ * types.
+ *
+ * It's probably not worthwhile to generate such a specialized implementation
+ * for hash tables that aren't performance or space sensitive.
+ *
+ * Compared to dynahash, simplehash has the following benefits:
+ *
+ * - Due to the "templated" code generation has known structure sizes and no
+ * indirect function calls (which show up substantially in dynahash
+ * profiles). These features considerably increase speed for small
+ * entries.
+ * - Open addressing has better CPU cache behavior than dynahash's chained
+ * hashtables.
+ * - The generated interface is type-safe and easier to use than dynahash,
+ * though at the cost of more complex setup.
+ * - Does not require the overhead of a separate memory context.
*
* Usage notes:
*
--
2.17.1
On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote:
In another thread [1] I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."
+1
I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.
+ * - It supports partitioning, which is useful for shared memory access using
I wonder if we should say a bit more about the shared memory mode.
Shared memory dynahash tables are allocated in a fixed size area at
startup, and are discoverable by name in other other processes that
need to get access to them, while simplehash assumes that it can get
memory from a MemoryContext or an allocator with a malloc/free-style
interface, which isn't very well suited for use in shared memory.
(I'm sure you can convince it to work in shared memory with some
work.)
v1-0002-Improve-simplehash-usage-notes.patch
+ * For convenience the hash table create functions accept a void pointer
+ * will be stored in the hash table type's member private_data.
*that* will be stored?
v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.
I didn't double-check all the expansions of the macros but +1 for this
idea, it's very useful.
On Mon, Jul 20, 2020 at 1:29 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote:
In another thread [1] I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."+1
I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.+ * - It supports partitioning, which is useful for shared memory access using
I wonder if we should say a bit more about the shared memory mode.
Shared memory dynahash tables are allocated in a fixed size area at
startup, and are discoverable by name in other other processes that
need to get access to them, while simplehash assumes that it can get
memory from a MemoryContext or an allocator with a malloc/free-style
interface, which isn't very well suited for use in shared memory.
(I'm sure you can convince it to work in shared memory with some
work.)
Added.
v1-0002-Improve-simplehash-usage-notes.patch
+ * For convenience the hash table create functions accept a void pointer + * will be stored in the hash table type's member private_data.*that* will be stored?
Fixed.
v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.I didn't double-check all the expansions of the macros but +1 for this
idea, it's very useful.
James
Attachments:
v2-0003-Show-sample-simplehash-method-signatures.patchapplication/octet-stream; name=v2-0003-Show-sample-simplehash-method-signatures.patchDownload
From 7a3138298e3cef538374a1d742fb3a17941a2682 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:40:44 -0400
Subject: [PATCH v2 3/3] Show sample simplehash method signatures
---
src/include/lib/simplehash.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 884a2baf2b..068fb2ba21 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -179,24 +179,47 @@ typedef struct SH_ITERATOR
/* externally visible function prototypes */
#ifdef SH_RAW_ALLOCATOR
+/* <prefix>_hash <prefix>_create(uint32 nelements, void *private_data) */
SH_SCOPE SH_TYPE *SH_CREATE(uint32 nelements, void *private_data);
#else
+/*
+ * <prefix>_hash <prefix>_create(MemoryContext ctx, uint32 nelements,
+ * void *private_data)
+ */
SH_SCOPE SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements,
void *private_data);
#endif
+/* void <prefix>_destroy(<prefix>_hash *tb) */
SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
+/* void <prefix>_reset(<prefix>_hash *tb) */
SH_SCOPE void SH_RESET(SH_TYPE * tb);
+/* void <prefix>_grow(<prefix>_hash *tb) */
SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize);
+/* <element> <prefix>_insert(<prefix>_hash *tb, <key> key, bool *found) */
SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
+/*
+ * <element> <prefix>_insert_hash(<prefix>_hash *tb, <key> key, uint32 hash,
+ * bool *found)
+ */
SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
uint32 hash, bool *found);
+/* <element> <prefix>_lookup(<prefix>_hash *tb, <key> key) */
SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
+/* <element> <prefix>_lookup_hash(<prefix>_hash *tb, <key> key, uint32 hash) */
SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
uint32 hash);
+/* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
+/* void <prefix>_start_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/*
+ * void <prefix>_start_iterate_at(<prefix>_hash *tb, <prefix>_iterator *iter,
+ * uint32 at)
+ */
SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
+/* <element> <prefix>_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/* void <prefix>_stat(<prefix>_hash *tb */
SH_SCOPE void SH_STAT(SH_TYPE * tb);
#endif /* SH_DECLARE */
--
2.20.1 (Apple Git-117)
v2-0002-Improve-simplehash-usage-notes.patchapplication/octet-stream; name=v2-0002-Improve-simplehash-usage-notes.patchDownload
From 9e35e7119c7244e8fae94c3fa0cd90809539696d Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:39:04 -0400
Subject: [PATCH v2 2/3] Improve simplehash usage notes
---
src/include/lib/simplehash.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index b0472a3fc7..884a2baf2b 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -51,6 +51,19 @@
* - SH_STORE_HASH - if defined the hash is stored in the elements
* - SH_GET_HASH(tb, a) - return the field to store the hash in
*
+ * The element type is required to contain a "uint32 status" member.
+ *
+ * While SH_STORE_HASH (and subsequently SH_GET_HASH) are optional, because
+ * the hash table implementation needs to compare hashes to move elements
+ * (particularly when growing the hash), it's preferable, if possible, to
+ * store the element's hash in the element's data type. If the hash is so
+ * stored, the hash table will also compare hashes before calling SH_EQUAL
+ * when comparing two keys.
+ *
+ * For convenience the hash table create functions accept a void pointer
+ * that will be stored in the hash table type's member private_data. This
+ * allows callbacks to reference caller provided data.
+ *
* For examples of usage look at tidbitmap.c (file local definition) and
* execnodes.h/execGrouping.c (exposed declaration, file local
* implementation).
--
2.20.1 (Apple Git-117)
v2-0001-Summarize-trade-offs-between-simplehash-and-dynah.patchapplication/octet-stream; name=v2-0001-Summarize-trade-offs-between-simplehash-and-dynah.patchDownload
From e01df35e8a2558b626bfbff861ff520ee5134b70 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Thu, 30 Apr 2020 21:36:07 -0400
Subject: [PATCH v2 1/3] Summarize trade-offs between simplehash and dynahash
When reading the code it's not obvious when one should prefer dynahash
over simplehash and vice-versa, so, for programmer-friendliness,
add comments to inform that decision.
---
src/backend/utils/hash/dynahash.c | 12 +++++++++++-
src/include/lib/simplehash.h | 25 +++++++++++++++++++++----
2 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 5948b01abc..f4fbccdd7e 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1,7 +1,7 @@
/*-------------------------------------------------------------------------
*
* dynahash.c
- * dynamic hash tables
+ * dynamic chained hash tables
*
* dynahash.c supports both local-to-a-backend hash tables and hash tables in
* shared memory. For shared hash tables, it is the caller's responsibility
@@ -41,6 +41,16 @@
* function must be supplied; comparison defaults to memcmp() and key copying
* to memcpy() when a user-defined hashing function is selected.
*
+ * Compared to simplehash, dynahash has the following benefits:
+ *
+ * - It supports partitioning, which is useful for shared memory access using
+ * locks.
+ * - Shared memory hashes are allocated in a fixed size area at startup and
+ * are discoverable by name from other processes.
+ * - Because entries don't need to be moved in the case of hash conflicts, has
+ * better performance for large entries
+ * - Guarantees stable pointers to entries.
+ *
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 90dfa8a695..b0472a3fc7 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -1,10 +1,27 @@
/*
* simplehash.h
*
- * Hash table implementation which will be specialized to user-defined
- * types, by including this file to generate the required code. It's
- * probably not worthwhile to do so for hash tables that aren't performance
- * or space sensitive.
+ * When included this file generates a "templated" (by way of macros)
+ * open-addressing hash table implementation specialized to user-defined
+ * types.
+ *
+ * It's probably not worthwhile to generate such a specialized implementation
+ * for hash tables that aren't performance or space sensitive.
+ *
+ * Compared to dynahash, simplehash has the following benefits:
+ *
+ * - Due to the "templated" code generation has known structure sizes and no
+ * indirect function calls (which show up substantially in dynahash
+ * profiles). These features considerably increase speed for small
+ * entries.
+ * - Open addressing has better CPU cache behavior than dynahash's chained
+ * hashtables.
+ * - The generated interface is type-safe and easier to use than dynahash,
+ * though at the cost of more complex setup.
+ * - Allocates memory in a MemoryContext or another allocator with a
+ * malloc/free style interface (which isn't easily usable in a shared
+ * memory context)
+ * - Does not require the overhead of a separate memory context.
*
* Usage notes:
*
--
2.20.1 (Apple Git-117)
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
[v2 patch set]
I ran it through pgindent which insisted on adding some newlines, I
manually replaced some spaces with tabs to match nearby lines, I added
some asterisks in your example function prototypes where <element> is
returned because they seemed to be missing, and I pushed this.
Thanks!
On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
[v2 patch set]
I ran it through pgindent which insisted on adding some newlines, I
manually replaced some spaces with tabs to match nearby lines, I added
some asterisks in your example function prototypes where <element> is
returned because they seemed to be missing, and I pushed this.
Thanks!
Thanks for reviewing and committing!
James
On Sat, 1 Aug 2020 at 12:17, Thomas Munro <thomas.munro@gmail.com> wrote:
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote:
[v2 patch set]
I ran it through pgindent which insisted on adding some newlines, I
manually replaced some spaces with tabs to match nearby lines, I added
some asterisks in your example function prototypes where <element> is
returned because they seemed to be missing, and I pushed this.
Thanks!
I was just reading over this and wondered about the following:
+ * The element type is required to contain a "uint32 status" member.
I see that PagetableEntry does not follow this and I also didn't
follow it when writing the Result Cache patch in [1]/messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com. I managed to
shrink the struct I was using for the hash table by 4 bytes by using a
char instead of an int. That sounds like a small amount of memory, but
it did result in much better cache hit ratios in the patch
Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.
[1]: /messages/by-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.
This didn't turn out quite as pretty as I had imagined. I needed to
leave the two statuses defined in simplehash.h so that callers could
make use of them. (Result Cache will do this).
The change here would be callers would need to use SH_STATUS_IN_USE
rather than <prefix>_SH_IN_USE.
I'm not really that sold on doing things this way. I really just don't
want to have to make my status field a uint32 in Result Cache per what
the new comment states we must do. If there's a nicer way, then
perhaps that's worth considering.
David
Attachments:
simplehash_fix.patchapplication/octet-stream; name=simplehash_fix.patchDownload
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 96f0c21f60..f93809a482 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -51,7 +51,8 @@
* - SH_STORE_HASH - if defined the hash is stored in the elements
* - SH_GET_HASH(tb, a) - return the field to store the hash in
*
- * The element type is required to contain a "uint32 status" member.
+ * The element type is required to contain a "status" member which must be
+ * an integer type of at least char in width.
*
* While SH_STORE_HASH (and subsequently SH_GET_HASH) are optional, because
* the hash table implementation needs to compare hashes to move elements
@@ -98,9 +99,6 @@
/* type declarations */
#define SH_TYPE SH_MAKE_NAME(hash)
-#define SH_STATUS SH_MAKE_NAME(status)
-#define SH_STATUS_EMPTY SH_MAKE_NAME(SH_EMPTY)
-#define SH_STATUS_IN_USE SH_MAKE_NAME(SH_IN_USE)
#define SH_ITERATOR SH_MAKE_NAME(iterator)
/* function declarations */
@@ -130,6 +128,17 @@
#define SH_INSERT_HASH_INTERNAL SH_MAKE_NAME(insert_hash_internal)
#define SH_LOOKUP_HASH_INTERNAL SH_MAKE_NAME(lookup_hash_internal)
+/* Undef these for cases where users include simplehash.h twice */
+#undef SH_STATUS_EMPTY
+#undef SH_STATUS_IN_USE
+
+/*
+ * Values for SH_ELEMENT_TYPE->status field. Values must fit in char data
+ * type.
+ */
+#define SH_STATUS_EMPTY 0
+#define SH_STATUS_IN_USE 1
+
/* generate forward declarations necessary to use the hash table */
#ifdef SH_DECLARE
@@ -164,12 +173,6 @@ typedef struct SH_TYPE
void *private_data;
} SH_TYPE;
-typedef enum SH_STATUS
-{
- SH_STATUS_EMPTY = 0x00,
- SH_STATUS_IN_USE = 0x01
-} SH_STATUS;
-
typedef struct SH_ITERATOR
{
uint32 cur; /* current element */
@@ -1090,9 +1093,6 @@ SH_STAT(SH_TYPE * tb)
/* types */
#undef SH_TYPE
-#undef SH_STATUS
-#undef SH_STATUS_EMPTY
-#undef SH_STATUS_IN_USE
#undef SH_ITERATOR
/* external function names */
On Sun, Aug 2, 2020 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.This didn't turn out quite as pretty as I had imagined. I needed to
leave the two statuses defined in simplehash.h so that callers could
make use of them. (Result Cache will do this).The change here would be callers would need to use SH_STATUS_IN_USE
rather than <prefix>_SH_IN_USE.I'm not really that sold on doing things this way. I really just don't
want to have to make my status field a uint32 in Result Cache per what
the new comment states we must do. If there's a nicer way, then
perhaps that's worth considering.
Agreed that the new comment is wrong and should be changed.
I think you can probably go further, though, and make it require no
storage at all by making it optionally "intrusive", by using a special
value in an existing member, and supplying an expression to set and
test for that value. For example, maybe some users have a pointer but
never want to use NULL, and maybe some users already have a field
holding various flags that are one bit wide and can spare a bit.
On Mon, 3 Aug 2020 at 11:10, Thomas Munro <thomas.munro@gmail.com> wrote:
I think you can probably go further, though, and make it require no
storage at all by making it optionally "intrusive", by using a special
value in an existing member, and supplying an expression to set and
test for that value. For example, maybe some users have a pointer but
never want to use NULL, and maybe some users already have a field
holding various flags that are one bit wide and can spare a bit.
I agree that it would be good to allow users of simplehash.h that
additional flexibility. It may allow additional memory savings.
However, it would mean we'd need to do some additional work when we
create and grow the hash table to ensure that we mark new buckets as
empty. For now, we get that for free with the zeroing of the newly
allocated memory, but we couldn't rely on that if we allowed users to
define their own macro.
It looks like none of the current callers could gain from this
1. TupleHashEntryData does not have any reusable fields. The status
should fit in the padding on a 64-bit machine anyway.
2. PagetableEntry already has a status that fits into the padding.
3. manifest_file could have its status moved to the end of the struct
and made into a char and the struct would be the same size as if the
field did not exist.
So, with the current users, we'd stand to lose more than we'd gain
from doing it that way.
David
On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote:
So, with the current users, we'd stand to lose more than we'd gain
from doing it that way.
FWIW, I'd be ok with just:
- * The element type is required to contain a "uint32 status" member.
+ * The element type is required to contain an integer-based
"status" member
+ * which can store the range of values defined in the SH_STATUS enum.
David
On Mon, Aug 3, 2020 at 11:42 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote:
So, with the current users, we'd stand to lose more than we'd gain
from doing it that way.FWIW, I'd be ok with just:
- * The element type is required to contain a "uint32 status" member. + * The element type is required to contain an integer-based "status" member + * which can store the range of values defined in the SH_STATUS enum.
Thanks for the correction. Pushed.