speed up unicode normalization quick check
Hi,
Attached is a patch to use perfect hashing to speed up Unicode
normalization quick check.
0001 changes the set of multipliers attempted when generating the hash
function. The set in HEAD works for the current set of NFC codepoints,
but not for the other types. Also, the updated multipliers now all
compile to shift-and-add on most platform/compiler combinations
available on godbolt.org (earlier experiments found in [1]/messages/by-id/CACPNZCuVTiLhxAzXp9uCeHGUyHMa59h6_pmP+_W-SzXG0UyY9w@mail.gmail.com). The
existing keyword lists are fine with the new set, and don't seem to be
very picky in general. As a test, it also successfully finds a
function for the OS "words" file, the "D" sets of codepoints, and for
sets of the first n built-in OIDs, where n > 5.
0002 builds on top of the existing normprops infrastructure to use a
hash function for NFC quick check. Below are typical numbers in a
non-assert build:
select count(*) from (select md5(i::text) as t from
generate_series(1,100000) as i) s where t is nfc normalized;
HEAD 411ms 413ms 409ms
patch 296ms 297ms 299ms
The addition of "const" was to silence a compiler warning. Also, I
changed the formatting of the output file slightly to match pgindent.
0003 uses hashing for NFKC and removes binary search. This is split
out for readability. I gather NFKC is a less common use case, so this
could technically be left out. Since this set is larger, the
performance gains are a bit larger as well, at the cost of 19kB of
binary space:
HEAD 439ms 440ms 442ms
patch 299ms 301ms 301ms
I'll add this to the July commitfest.
[1]: /messages/by-id/CACPNZCuVTiLhxAzXp9uCeHGUyHMa59h6_pmP+_W-SzXG0UyY9w@mail.gmail.com
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v1-0001-Tweak-the-set-of-candidate-multipliers-for-genera.patchapplication/octet-stream; name=v1-0001-Tweak-the-set-of-candidate-multipliers-for-genera.patchDownload+2-3
v1-0002-Use-perfect-hashing-for-NFC-Unicode-normalization.patchapplication/octet-stream; name=v1-0002-Use-perfect-hashing-for-NFC-Unicode-normalization.patchDownload+412-9
v1-0003-Use-perfect-hashing-for-NFKC-Unicode-normalizatio.patchapplication/octet-stream; name=v1-0003-Use-perfect-hashing-for-NFKC-Unicode-normalizatio.patchDownload+1258-23
On May 21, 2020, at 12:12 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
Hi,
Attached is a patch to use perfect hashing to speed up Unicode
normalization quick check.0001 changes the set of multipliers attempted when generating the hash
function. The set in HEAD works for the current set of NFC codepoints,
but not for the other types. Also, the updated multipliers now all
compile to shift-and-add on most platform/compiler combinations
available on godbolt.org (earlier experiments found in [1]). The
existing keyword lists are fine with the new set, and don't seem to be
very picky in general. As a test, it also successfully finds a
function for the OS "words" file, the "D" sets of codepoints, and for
sets of the first n built-in OIDs, where n > 5.
Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliers is the right way to go. Could you instead make them arguments such that gen_keywordlist.pl, generate-unicode_combining_table.pl, and future callers can pass in the numbers they want? Or is there some advantage to having it this way?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, May 29, 2020 at 5:59 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On May 21, 2020, at 12:12 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
very picky in general. As a test, it also successfully finds a
function for the OS "words" file, the "D" sets of codepoints, and for
sets of the first n built-in OIDs, where n > 5.Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliers is the right way to go.
Calling it "each time" with a sample size of two is a bit of a
stretch. The first implementation made a reasonable attempt to suit
future uses and I simply made it a bit more robust. In the text quoted
above you can see I tested some scenarios beyond the current use
cases, with key set sizes as low as 6 and as high as 250k.
Could you instead make them arguments such that gen_keywordlist.pl, generate-unicode_combining_table.pl, and future callers can pass in the numbers they want? Or is there some advantage to having it this way?
That is an implementation detail that callers have no business knowing about.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On May 28, 2020, at 8:54 PM, John Naylor <john.naylor@2ndquadrant.com> wrote:
On Fri, May 29, 2020 at 5:59 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On May 21, 2020, at 12:12 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
very picky in general. As a test, it also successfully finds a
function for the OS "words" file, the "D" sets of codepoints, and for
sets of the first n built-in OIDs, where n > 5.Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliers is the right way to go.
I forgot in my first round of code review to mention, "thanks for the patch". I generally like what you are doing here, and am trying to review it so it gets committed.
Calling it "each time" with a sample size of two is a bit of a
stretch. The first implementation made a reasonable attempt to suit
future uses and I simply made it a bit more robust. In the text quoted
above you can see I tested some scenarios beyond the current use
cases, with key set sizes as low as 6 and as high as 250k.
I don't really have an objection to what you did in the patch. I'm not going to lose any sleep if it gets committed this way.
The reason I gave this feedback is that I saved the *kwlist_d.h files generated before applying the patch, and compared them with the same files generated after applying the patch, and noticed a very slight degradation. Most of the files changed without any expansion, but the largest of them, src/common/kwlist_d.h, changed from
static const int16 h[901]
to
static const int16 h[902]
suggesting that even with your reworking of the parameters for PerfectHash, you weren't able to find a single set of numbers that worked for the two datasets quite as well as different sets of numbers each tailored for a particular data set. I started to imagine that if we wanted to use PerfectHash for yet more stuff, the problem of finding numbers that worked across all N data sets (even if N is only 3 or 4) might be harder still. That's all I was referring to. 901 -> 902 is such a small expansion that it might not be worth worrying about.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, May 30, 2020 at 12:13 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I forgot in my first round of code review to mention, "thanks for the patch". I generally like what you are doing here, and am trying to review it so it gets committed.
And I forgot to say thanks for taking a look!
The reason I gave this feedback is that I saved the *kwlist_d.h files generated before applying the patch, and compared them with the same files generated after applying the patch, and noticed a very slight degradation. Most of the files changed without any expansion, but the largest of them, src/common/kwlist_d.h, changed from
static const int16 h[901]
to
static const int16 h[902]
Interesting, I hadn't noticed. With 450 keywords, we need at least 901
elements in the table. Since 901 is divisible by the new hash
multiplier 17, this gets triggered:
# However, it would be very bad if $nverts were exactly equal to either
# $hash_mult1 or $hash_mult2: effectively, that hash function would be
# sensitive to only the last byte of each key. Cases where $nverts is a
# multiple of either multiplier likewise lose information. (But $nverts
# can't actually divide them, if they've been intelligently chosen as
# primes.) We can avoid such problems by adjusting the table size.
while ($nverts % $hash_mult1 == 0
|| $nverts % $hash_mult2 == 0)
{
$nverts++;
}
This is harmless, and will go away next time we add a keyword.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached is version 4, which excludes the output file from pgindent,
to match recent commit 74d4608f5. Since it won't be indented again, I
also tweaked the generator script to match pgindent for the typedef,
since we don't want to lose what pgindent has fixed already. This last
part isn't new to v4, but I thought I'd highlight it anyway.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v4-0001-Tweak-the-set-of-candidate-multipliers-for-genera.patchapplication/x-patch; name=v4-0001-Tweak-the-set-of-candidate-multipliers-for-genera.patchDownload+2-3
v4-0002-Use-perfect-hashing-for-NFC-Unicode-normalization.patchapplication/x-patch; name=v4-0002-Use-perfect-hashing-for-NFC-Unicode-normalization.patchDownload+415-10
v4-0003-Use-perfect-hashing-for-NKFC-Unicode-normalizatio.patchapplication/x-patch; name=v4-0003-Use-perfect-hashing-for-NKFC-Unicode-normalizatio.patchDownload+1258-23
On Sep 18, 2020, at 9:41 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
Attached is version 4, which excludes the output file from pgindent,
to match recent commit 74d4608f5. Since it won't be indented again, I
also tweaked the generator script to match pgindent for the typedef,
since we don't want to lose what pgindent has fixed already. This last
part isn't new to v4, but I thought I'd highlight it anyway.
0001 looks ok to me. The change is quite minor. I reviewed it by comparing the assembly generated for perfect hash functions before and after applying the patch.
For 0001, the assembly code generated from the perfect hash functions in src/common/keywords.s and src/pl/plpgsql/src/pl_scanner.s do not appear to differ in any performance significant way. The assembly code generated in src/interfaces/ecpg/preproc/ecpg_keywords.s and src/interfaces/ecpg/preproc/c_keywords.s change enough that I wouldn't try to compare them just by visual inspection.
Compiled using -g -O2
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I'm attaching the diffs of the old and new assembly files, if anyone cares to look.
Attachments:
assember.diffsapplication/octet-stream; name=assember.diffs; x-unix-mode=0644Download+1147-1220
On Sep 18, 2020, at 9:41 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
Attached is version 4, which excludes the output file from pgindent,
to match recent commit 74d4608f5. Since it won't be indented again, I
also tweaked the generator script to match pgindent for the typedef,
since we don't want to lose what pgindent has fixed already. This last
part isn't new to v4, but I thought I'd highlight it anyway.--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<v4-0001-Tweak-the-set-of-candidate-multipliers-for-genera.patch><v4-0002-Use-perfect-hashing-for-NFC-Unicode-normalization.patch><v4-0003-Use-perfect-hashing-for-NKFC-Unicode-normalizatio.patch>
0002 and 0003 look good to me. I like the way you cleaned up a bit with the unicode_norm_props struct, which makes the code a bit more tidy, and on my compiler under -O2 it does not generate any extra runtime dereferences, as the compiler can see through the struct just fine. My only concern would be if some other compilers might not see through the struct, resulting in a runtime performance cost? I wouldn't even ask, except that qc_hash_lookup is called in a fairly tight loop.
To clarify, the following changes to the generated code which remove the struct and corresponding dereferences (not intended as a patch submission) cause zero bytes of change in the compiled output for me on mac/clang, which is good, and generate inconsequential changes on linux/gcc, which is also good, but I wonder if that is true for all compilers. In your commit message for 0001 you mentioned testing on a multiplicity of compilers. Did you do that for 0002 and 0003 as well?
diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index 1714837e64..976b96e332 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -476,8 +476,11 @@ qc_compare(const void *p1, const void *p2)
return (v1 - v2);
}
-static const pg_unicode_normprops *
-qc_hash_lookup(pg_wchar ch, const unicode_norm_info * norminfo)
+static inline const pg_unicode_normprops *
+qc_hash_lookup(pg_wchar ch,
+ const pg_unicode_normprops *normprops,
+ qc_hash_func hash,
+ int num_normprops)
{
int h;
uint32 hashkey;
@@ -487,21 +490,21 @@ qc_hash_lookup(pg_wchar ch, const unicode_norm_info * norminfo)
* in network order.
*/
hashkey = htonl(ch);
- h = norminfo->hash(&hashkey);
+ h = hash(&hashkey);
/* An out-of-range result implies no match */
- if (h < 0 || h >= norminfo->num_normprops)
+ if (h < 0 || h >= num_normprops)
return NULL;
/*
* Since it's a perfect hash, we need only match to the specific codepoint
* it identifies.
*/
- if (ch != norminfo->normprops[h].codepoint)
+ if (ch != normprops[h].codepoint)
return NULL;
/* Success! */
- return &norminfo->normprops[h];
+ return &normprops[h];
}
/*
@@ -518,7 +521,10 @@ qc_is_allowed(UnicodeNormalizationForm form, pg_wchar ch)
switch (form)
{
case UNICODE_NFC:
- found = qc_hash_lookup(ch, &UnicodeNormInfo_NFC_QC);
+ found = qc_hash_lookup(ch,
+ UnicodeNormProps_NFC_QC,
+ NFC_QC_hash_func,
+ NFC_QC_num_normprops);
break;
case UNICODE_NFKC:
found = bsearch(&key,
diff --git a/src/include/common/unicode_normprops_table.h b/src/include/common/unicode_normprops_table.h
index 5e1e382af5..38300cfa12 100644
--- a/src/include/common/unicode_normprops_table.h
+++ b/src/include/common/unicode_normprops_table.h
@@ -13,13 +13,6 @@ typedef struct
signed int quickcheck:4; /* really UnicodeNormalizationQC */
} pg_unicode_normprops;
-typedef struct
-{
- const pg_unicode_normprops *normprops;
- qc_hash_func hash;
- int num_normprops;
-} unicode_norm_info;
-
static const pg_unicode_normprops UnicodeNormProps_NFC_QC[] = {
{0x0300, UNICODE_NORM_QC_MAYBE},
{0x0301, UNICODE_NORM_QC_MAYBE},
@@ -1583,12 +1576,6 @@ NFC_QC_hash_func(const void *key)
return h[a % 2463] + h[b % 2463];
}
-static const unicode_norm_info UnicodeNormInfo_NFC_QC = {
- UnicodeNormProps_NFC_QC,
- NFC_QC_hash_func,
- 1231
-};
-
static const pg_unicode_normprops UnicodeNormProps_NFKC_QC[] = {
{0x00A0, UNICODE_NORM_QC_NO},
{0x00A8, UNICODE_NORM_QC_NO},
--
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Sep 19, 2020 at 1:46 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
0002 and 0003 look good to me. I like the way you cleaned up a bit with the unicode_norm_props struct, which makes the code a bit more tidy, and on my compiler under -O2 it does not generate any extra runtime dereferences, as the compiler can see through the struct just fine. My only concern would be if some other compilers might not see through the struct, resulting in a runtime performance cost? I wouldn't even ask, except that qc_hash_lookup is called in a fairly tight loop.
(I assume you mean unicode_norm_info) Yeah, that usage was copied from
the keyword list code. I believe it was done for the convenience of
the callers. That is worth something, and so is consistency. That
said, I'd be curious if there is a measurable impact for some
platforms.
In your commit message for 0001 you mentioned testing on a multiplicity of compilers. Did you do that for 0002 and 0003 as well?
For that, I was simply using godbolt.org to test compiling the
multiplications down to shift-and-adds. Very widespread, I only
remember MSVC as not doing it. I'm not sure a few extra cycles would
have been noticeable here, but it can't hurt to have that guarantee.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sep 19, 2020, at 3:58 PM, John Naylor <john.naylor@2ndquadrant.com> wrote:
On Sat, Sep 19, 2020 at 1:46 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:0002 and 0003 look good to me. I like the way you cleaned up a bit with the unicode_norm_props struct, which makes the code a bit more tidy, and on my compiler under -O2 it does not generate any extra runtime dereferences, as the compiler can see through the struct just fine. My only concern would be if some other compilers might not see through the struct, resulting in a runtime performance cost? I wouldn't even ask, except that qc_hash_lookup is called in a fairly tight loop.
(I assume you mean unicode_norm_info) Yeah, that usage was copied from
the keyword list code. I believe it was done for the convenience of
the callers. That is worth something, and so is consistency. That
said, I'd be curious if there is a measurable impact for some
platforms.
Right, unicode_norm_info. I'm not sure the convenience of the callers matters here, since the usage is restricted to just one file, but I also don't have a problem with the code as you have it.
In your commit message for 0001 you mentioned testing on a multiplicity of compilers. Did you do that for 0002 and 0003 as well?
For that, I was simply using godbolt.org to test compiling the
multiplications down to shift-and-adds. Very widespread, I only
remember MSVC as not doing it. I'm not sure a few extra cycles would
have been noticeable here, but it can't hurt to have that guarantee.
I am marking this ready for committer. I didn't object to the whitespace weirdness in your patch (about which `git apply` grumbles) since you seem to have done that intentionally. I have no further comments on the performance issue, since I don't have any other platforms at hand to test it on. Whichever committer picks this up can decide if the issue matters to them enough to punt it back for further performance testing.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Sep 19, 2020 at 04:09:27PM -0700, Mark Dilger wrote:
I am marking this ready for committer. I didn't object to the
whitespace weirdness in your patch (about which `git apply`
grumbles) since you seem to have done that intentionally. I have no
further comments on the performance issue, since I don't have any
other platforms at hand to test it on. Whichever committer picks
this up can decide if the issue matters to them enough to punt it
back for further performance testing.
About 0001, the new set of multipliers looks fine to me. Even if this
adds an extra item from 901 to 902 because this can be divided by 17
in kwlist_d.h, I also don't think that this is really much bothering
and. As mentioned, this impacts none of the other tables that are much
smaller in size, on top of coming back to normal once a new keyword
will be added. Being able to generate perfect hash functions for much
larger sets is a nice property to have. While on it, I also looked at
the assembly code with gcc -O2 for keywords.c & co and I have not
spotted any huge difference. So I'd like to apply this first if there
are no objections.
I have tested 0002 and 0003, that had better be merged together at the
end, and I can see performance improvements with MSVC and gcc similar
to what is being reported upthread, with 20~30% gains for simple
data sample using IS NFC/NFKC. That's cool.
Including unicode_normprops_table.h in what gets ignored with pgindent
is also fine at the end, even with the changes to make the output of
the structures generated more in-line with what pgindent generates.
One tiny comment I have is that I would have added an extra comment in
the unicode header generated to document the set of structures
generated for the perfect hash, but that's easy enough to add.
--
Michael
On Wed, Oct 07, 2020 at 03:18:44PM +0900, Michael Paquier wrote:
About 0001, the new set of multipliers looks fine to me. Even if this
adds an extra item from 901 to 902 because this can be divided by 17
in kwlist_d.h, I also don't think that this is really much bothering
and. As mentioned, this impacts none of the other tables that are much
smaller in size, on top of coming back to normal once a new keyword
will be added. Being able to generate perfect hash functions for much
larger sets is a nice property to have. While on it, I also looked at
the assembly code with gcc -O2 for keywords.c & co and I have not
spotted any huge difference. So I'd like to apply this first if there
are no objections.
I looked at this one again today, and applied it. I looked at what
MSVC compiler was able to do in terms of optimizations with
shift-and-add for multipliers, and it is by far not as good as gcc or
clang, applying imul for basically all the primes we could use for the
perfect hash generation.
I have tested 0002 and 0003, that had better be merged together at the
end, and I can see performance improvements with MSVC and gcc similar
to what is being reported upthread, with 20~30% gains for simple
data sample using IS NFC/NFKC. That's cool.
For these two, I have merged both together and did some adjustments as
per the attached. Not many tweaks, mainly some more comments for the
unicode header files as the number of structures generated gets
higher. FWIW, with the addition of the two hash tables,
libpgcommon_srv.a grows from 1032600B to 1089240B, which looks like a
small price to pay for the ~30% performance gains with the quick
checks.
--
Michael
Attachments:
uni-norm-hash-v5.patchtext/x-diff; charset=us-asciiDownload+1679-24
On Thu, Oct 8, 2020 at 2:48 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Oct 07, 2020 at 03:18:44PM +0900, Michael Paquier wrote:
I looked at this one again today, and applied it. I looked at what
MSVC compiler was able to do in terms of optimizationswith
shift-and-add for multipliers, and it is by far not as good as gcc or
clang, applying imul for basically all the primes we could use for the
perfect hash generation.
Thanks for picking this up! As I recall, godbolt.org also showed MSVC
unable to do this optimization.
I have tested 0002 and 0003, that had better be merged together at the
end, and I can see performance improvements with MSVC and gcc similar
to what is being reported upthread, with 20~30% gains for simple
data sample using IS NFC/NFKC. That's cool.For these two, I have merged both together and did some adjustments as
per the attached. Not many tweaks, mainly some more comments for the
unicode header files as the number of structures generated gets
higher.
Looks fine overall, but one minor nit: I'm curious why you made a separate
section in the pgindent exclusions. The style in that file seems to be one
comment per category.
--
John Naylor
On Thu, Oct 08, 2020 at 04:52:18AM -0400, John Naylor wrote:
Looks fine overall, but one minor nit: I'm curious why you made a separate
section in the pgindent exclusions. The style in that file seems to be one
comment per category.
Both parts indeed use PerfectHash.pm, but are generated by different
scripts so that looked better as separate items.
--
Michael
On Thu, Oct 8, 2020 at 8:29 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Oct 08, 2020 at 04:52:18AM -0400, John Naylor wrote:
Looks fine overall, but one minor nit: I'm curious why you made a
separate
section in the pgindent exclusions. The style in that file seems to be
one
comment per category.
Both parts indeed use PerfectHash.pm, but are generated by different
scripts so that looked better as separate items.
Okay, thanks.
--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 08, 2020 at 06:22:39PM -0400, John Naylor wrote:
Okay, thanks.
And applied. I did some more micro benchmarking with the quick
checks, and the numbers are cool, close to what you mentioned for the
quick checks of both NFC and NFKC.
Just wondering about something in the same area, did you look at the
decomposition table?
--
Michael
On Sun, 11 Oct 2020 at 19:27, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Oct 08, 2020 at 06:22:39PM -0400, John Naylor wrote:
Okay, thanks.
And applied.
The following warning recently started to be shown in my
environment(FreeBSD clang 8.0.1). Maybe it is relevant with this
commit:
unicode_norm.c:478:12: warning: implicit declaration of function
'htonl' is invalid in C99 [-Wimplicit-function-declaration]
hashkey = htonl(ch);
^
1 warning generated.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 12, 2020 at 02:43:06PM +0900, Masahiko Sawada wrote:
The following warning recently started to be shown in my
environment(FreeBSD clang 8.0.1). Maybe it is relevant with this
commit:unicode_norm.c:478:12: warning: implicit declaration of function
'htonl' is invalid in C99 [-Wimplicit-function-declaration]
hashkey = htonl(ch);
^
Thanks, it is of course relevant to this commit. None of the
BSD animals complain here. So, while it would be tempting to have an
extra include with arpa/inet.h, I think that it would be better to
just use pg_hton32() in pg_bswap.h, as per the attached. Does that
take care of your problem?
--
Michael
Attachments:
htonl-warning.patchtext/x-diff; charset=us-asciiDownload+2-1
On Mon, 12 Oct 2020 at 15:27, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Oct 12, 2020 at 02:43:06PM +0900, Masahiko Sawada wrote:
The following warning recently started to be shown in my
environment(FreeBSD clang 8.0.1). Maybe it is relevant with this
commit:unicode_norm.c:478:12: warning: implicit declaration of function
'htonl' is invalid in C99 [-Wimplicit-function-declaration]
hashkey = htonl(ch);
^Thanks, it is of course relevant to this commit. None of the
BSD animals complain here. So, while it would be tempting to have an
extra include with arpa/inet.h, I think that it would be better to
just use pg_hton32() in pg_bswap.h, as per the attached. Does that
take care of your problem?
Thank you for the patch!
Yes, this patch resolves the problem.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Oct 11, 2020 at 6:27 AM Michael Paquier <michael@paquier.xyz> wrote:
And applied. I did some more micro benchmarking with the quick
checks, and the numbers are cool, close to what you mentioned for the
quick checks of both NFC and NFKC.
Thanks for confirming.
Just wondering about something in the same area, did you look at the
decomposition table?
Hmm, I hadn't actually, but now that you mention it, that looks worth
optimizing that as well, since there are multiple callers that search that
table -- thanks for the reminder. The attached patch was easy to whip up,
being similar to the quick check (doesn't include the pg_hton32 fix). I'll
do some performance testing soon. Note that a 25kB increase in size could
be present in frontend binaries as well in this case. While looking at
decomposition, I noticed that recomposition does a linear search through
all 6600+ entries, although it seems only about 800 are valid for that.
That could be optimized as well now, since with hashing we have more
flexibility in the ordering and can put the recomp-valid entries in front.
I'm not yet sure if it's worth the additional complexity. I'll take a look
and start a new thread.
--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company