A qsort template
Hello,
In another thread[1]/messages/by-id/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com, I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.
[1]: /messages/by-id/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com
Attachments:
0001-Add-sort_template.h-for-making-fast-sort-functions.patchtext/x-patch; charset=US-ASCII; name=0001-Add-sort_template.h-for-making-fast-sort-functions.patchDownload+431-1
0002-Use-sort_template.h-for-qsort-and-qsort_arg.patchtext/x-patch; charset=US-ASCII; name=0002-Use-sort_template.h-for-qsort-and-qsort_arg.patchDownload+15-441
0003-Use-sort_template.h-for-qsort_tuple-and-qsort_ssup.patchtext/x-patch; charset=US-ASCII; name=0003-Use-sort_template.h-for-qsort_tuple-and-qsort_ssup.patchDownload+21-297
Hi,
On 2021-02-18 16:09:49 +1300, Thomas Munro wrote:
In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.
One place that could benefit is the qsort that BufferSync() does at the
start. I tried your patch for that, and it does reduce the sort time
considerably. For 64GB of mostly dirty shared_buffers from ~1.4s to
0.6s.
Now, obviously one can argue that that's not going to be the crucial
spot, and wouldn't be entirely wrong. OTOH, in my AIO branch I see
checkpointer doing ~10GB/s, leading to the sort being a measurable
portion of the overall time.
Greetings,
Andres Freund
On 18 Feb 2021, at 04:09, Thomas Munro <thomas.munro@gmail.com> wrote:
In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.
Just to play around with this while reviewing I made a qsort_strcmp, like in
the attached, and tested it using a ~9M word [0]https://github.com/dwyl/english-words/ shuffled 20 times over randomly shuffled wordlist.
While being too small input to make any meaningful difference in runtime (it
shaved a hair off but it might well be within the error margin) there was no
regression either. More importantly, it was really simple and quick to make a
tailored qsort which is the intention with the patch. While still being a bit
of magic, moving from the Perl generator makes this slightly less magic IMO so
+1 on this approach.
A tiny nitpick on the patch itself:
+ * - ST_COMPARE(a, b) - a simple comparison expression
+ * - ST_COMPARE(a, b, arg) - variant that takes an extra argument
Indentation.
All tests pass and the documentation in the the sort_template.h is enough to go
on, but I would prefer to see a comment in port/qsort.c referring back to
sort_template.h for documentation.
--
Daniel Gustafsson https://vmware.com/
[0]: https://github.com/dwyl/english-words/ shuffled 20 times over
Attachments:
qsort_tmpl_strcmp-patchapplication/octet-stream; name=qsort_tmpl_strcmp-patch; x-unix-mode=0644Download+22-12
On Wed, Mar 3, 2021 at 10:25 AM Daniel Gustafsson <daniel@yesql.se> wrote:
On 18 Feb 2021, at 04:09, Thomas Munro <thomas.munro@gmail.com> wrote:
In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.Just to play around with this while reviewing I made a qsort_strcmp, like in
the attached, and tested it using a ~9M word [0] randomly shuffled wordlist.
While being too small input to make any meaningful difference in runtime (it
shaved a hair off but it might well be within the error margin) there was no
regression either. More importantly, it was really simple and quick to make a
tailored qsort which is the intention with the patch. While still being a bit
of magic, moving from the Perl generator makes this slightly less magic IMO so
+1 on this approach.
Thanks for testing and reviewing!
A tiny nitpick on the patch itself:
+ * - ST_COMPARE(a, b) - a simple comparison expression + * - ST_COMPARE(a, b, arg) - variant that takes an extra argument Indentation.
Fixed. Also ran pgindent.
All tests pass and the documentation in the the sort_template.h is enough to go
on, but I would prefer to see a comment in port/qsort.c referring back to
sort_template.h for documentation.
I tried adding a comment along the lines "see lib/sort_template.h for
details", but it felt pretty redundant, when the file contains very
little other than #include "lib/sort_template.h" which should already
tell you to go and look there to find out what this is about...
I went ahead and pushed these.
I am sure there are plenty of opportunities to experiment with this
code. Here are some I recall Peter Geoghegan mentioning:
1. If you know that elements are unique, you could remove some
branches that deal with equal elements (see "r == 0").
2. Perhaps you might want to be able to disable the "presorted" check
in some cases?
3. The parameters 7, 7 and 40 were probably tuned for an ancient Vax
or similar[1]http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf. We see higher insertion sort thesholds such as 27 in
more recent sort algorithms[2]https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf used in eg the JVM. You could perhaps
speculate that the right answer depends in part on the element size; I
dunno, but if so, here we have that at compile time while traditional
qsort() does not.
As for which cases are actually worth specialising, I've attached the
example that Andres mentioned earlier; it seems like a reasonable
candidate to go ahead and commit too, but I realised that I'd
forgotten to attach it earlier.
It's possible that the existing support sorting tuples could be
further specialised for common sort key data types; I haven't tried
that.
[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf
[2]: https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf
Attachments:
0001-Specialize-checkpointer-sort-functions.patchtext/x-patch; charset=US-ASCII; name=0001-Specialize-checkpointer-sort-functions.patchDownload+22-16
Hi,
I wish we had the same for bsearch... :)
On 2021-03-03 17:17:13 +1300, Thomas Munro wrote:
As for which cases are actually worth specialising, I've attached the
example that Andres mentioned earlier; it seems like a reasonable
candidate to go ahead and commit too, but I realised that I'd
forgotten to attach it earlier.
From 4cec5cb9a2e0c50726b7337fb8221281e155c4cd Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 18 Feb 2021 14:47:28 +1300
Subject: [PATCH] Specialize checkpointer sort functions.When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine. One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.Discussion: /messages/by-id/CA+hUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse+HG0Yig@mail.gmail.com
Looks good to me.
Greetings,
Andres Freund
On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de> wrote:
I wish we had the same for bsearch... :)
Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...
It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]/messages/by-id/CA+hUKGLY47Cvu62mFDT53Ya0P95cGggcBN6R6aLpx6=Gm5j+1A@mail.gmail.com), just to have nice type safe
code though, not sure, I'm thinking about that...
[1]: /messages/by-id/CA+hUKGLY47Cvu62mFDT53Ya0P95cGggcBN6R6aLpx6=Gm5j+1A@mail.gmail.com
On Sat, Mar 13, 2021 at 3:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de> wrote:
I wish we had the same for bsearch... :)
Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]), just to have nice type safe
code though, not sure, I'm thinking about that...
I remembered a very good reason to do this: the ability to do
branch-free comparators in more places by introducing optional wider
results. That's good for TIDs (needs 49 bits), and places that want
to "reverse" a traditional comparator (just doing -result on an int
comparator that might theoretically return INT_MIN requires at least
33 bits). So I rebased the relevant parts of my earlier version, and
went through and wrote a bunch of examples to demonstrate all this
stuff actually working.
There are two categories of change in these patches:
0002-0005: Places that sort/unique/search OIDs, BlockNumbers and TIDs,
which can reuse a small set of typed functions (a few more could be
added, if useful). See sortitemptr.h and sortscalar.h. Mostly this
is just a notational improvement, and an excuse to drop a bunch of
duplicated code. In a few places this might really speed something
important up! Like VACUUM's lazy_tid_reaped().
0006-0009. Places where a specialised function is generated for one
special purpose, such as ANALYZE's HeapTuple sort, tidbitmap.c's
pagetable sort, some places in nbtree code etc. These may require
some case-by-case research on whether the extra executable size is
worth the speedup, and there are surely more opportunities like that;
I just picked on these arbitrarily.
Attachments:
0001-Add-bsearch-and-unique-templates-to-sort_template.h.patchtext/x-patch; charset=US-ASCII; name=0001-Add-bsearch-and-unique-templates-to-sort_template.h.patchDownload+124-19
0002-Supply-sort-search-specializations-for-common-scalar.patchtext/x-patch; charset=US-ASCII; name=0002-Supply-sort-search-specializations-for-common-scalar.patchDownload+38-1
0003-Use-qsort_oid-and-friends-in-obvious-places.patchtext/x-patch; charset=US-ASCII; name=0003-Use-qsort_oid-and-friends-in-obvious-places.patchDownload+24-63
0004-Supply-specialized-sort-search-routines-for-ItemPtrD.patchtext/x-patch; charset=US-ASCII; name=0004-Supply-specialized-sort-search-routines-for-ItemPtrD.patchDownload+32-1
0005-Use-qsort_itemptr-and-friends-in-various-places.patchtext/x-patch; charset=US-ASCII; name=0005-Use-qsort_itemptr-and-friends-in-various-places.patchDownload+7-66
0006-Specialize-the-HeapTuple-sort-routine-for-ANALYZE.patchtext/x-patch; charset=US-ASCII; name=0006-Specialize-the-HeapTuple-sort-routine-for-ANALYZE.patchDownload+9-28
0007-Specialize-pagetable-sort-routines-in-tidbitmap.c.patchtext/x-patch; charset=US-ASCII; name=0007-Specialize-pagetable-sort-routines-in-tidbitmap.c.patchDownload+27-47
0008-Specialize-some-sort-search-routines-in-nbtree-code.patchtext/x-patch; charset=US-ASCII; name=0008-Specialize-some-sort-search-routines-in-nbtree-code.patchDownload+48-50
0009-Specialize-sort-routine-used-by-multixact.c.patchtext/x-patch; charset=US-ASCII; name=0009-Specialize-sort-routine-used-by-multixact.c.patchDownload+9-27
Hi,
For 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :
+ * Remove duplicates from an array. Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,
The array is supposed to be sorted, right ?
The comment should mention this.
Cheers
On Sat, Mar 13, 2021 at 6:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Show quoted text
On Sat, Mar 13, 2021 at 3:49 PM Thomas Munro <thomas.munro@gmail.com>
wrote:On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de>
wrote:
I wish we had the same for bsearch... :)
Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]), just to have nice type safe
code though, not sure, I'm thinking about that...I remembered a very good reason to do this: the ability to do
branch-free comparators in more places by introducing optional wider
results. That's good for TIDs (needs 49 bits), and places that want
to "reverse" a traditional comparator (just doing -result on an int
comparator that might theoretically return INT_MIN requires at least
33 bits). So I rebased the relevant parts of my earlier version, and
went through and wrote a bunch of examples to demonstrate all this
stuff actually working.There are two categories of change in these patches:
0002-0005: Places that sort/unique/search OIDs, BlockNumbers and TIDs,
which can reuse a small set of typed functions (a few more could be
added, if useful). See sortitemptr.h and sortscalar.h. Mostly this
is just a notational improvement, and an excuse to drop a bunch of
duplicated code. In a few places this might really speed something
important up! Like VACUUM's lazy_tid_reaped().0006-0009. Places where a specialised function is generated for one
special purpose, such as ANALYZE's HeapTuple sort, tidbitmap.c's
pagetable sort, some places in nbtree code etc. These may require
some case-by-case research on whether the extra executable size is
worth the speedup, and there are surely more opportunities like that;
I just picked on these arbitrarily.
On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
For 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :
+ * Remove duplicates from an array. Return the new size. + */ +ST_SCOPE size_t +ST_UNIQUE(ST_ELEMENT_TYPE *array,The array is supposed to be sorted, right ?
The comment should mention this.
Good point, will update. Thanks!
On Mon, Mar 15, 2021 at 1:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
+ * Remove duplicates from an array. Return the new size. + */ +ST_SCOPE size_t +ST_UNIQUE(ST_ELEMENT_TYPE *array,The array is supposed to be sorted, right ?
The comment should mention this.Good point, will update. Thanks!
Rebased. Also fixed some formatting problems and updated
typedefs.list so they don't come back.
Attachments:
On Tue, Jun 15, 2021 at 10:55 PM Thomas Munro <thomas.munro@gmail.com>
wrote:
On Mon, Mar 15, 2021 at 1:09 PM Thomas Munro <thomas.munro@gmail.com>
wrote:On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:
+ * Remove duplicates from an array. Return the new size. + */ +ST_SCOPE size_t +ST_UNIQUE(ST_ELEMENT_TYPE *array,The array is supposed to be sorted, right ?
The comment should mention this.Good point, will update. Thanks!
Rebased. Also fixed some formatting problems and updated
typedefs.list so they don't come back.
Hi,
In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :
- const ST_ELEMENT_TYPE *
ST_SORT_PROTO_ARG);
+ const ST_ELEMENT_TYPE
*ST_SORT_PROTO_ARG);
It seems there is no real change in the line above. Better keep the
original formation.
* - ST_COMPARE_ARG_TYPE - type of extra argument
*
+ * To say that the comparator returns a type other than int, use:
+ *
+ * - ST_COMPARE_TYPE - an integer type
Since the ST_COMPARE_TYPE is meant to designate the type of the return
value, maybe ST_COMPARE_RET_TYPE would be better name.
It also goes with ST_COMPARE_ARG_TYPE preceding this.
- ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data,
- *pa,
- *pb,
- *pc,
- *pd,
- *pl,
- *pm,
- *pn;
+ ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data;
+ ST_POINTER_TYPE *pa;
There doesn't seem to be material change for the above hunk.
+ while (left <= right)
+ {
+ size_t mid = (left + right) / 2;
The computation for midpoint should be left + (right-left)/2.
Cheers
Hi Zhihong,
On Thu, Jun 17, 2021 at 8:13 AM Zhihong Yu <zyu@yugabyte.com> wrote:
In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :
- const ST_ELEMENT_TYPE * ST_SORT_PROTO_ARG); + const ST_ELEMENT_TYPE *ST_SORT_PROTO_ARG);It seems there is no real change in the line above. Better keep the original formation.
Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch. (That file is used by
pg_bsd_indent to decide if an identifier is a type or a variable,
which affects whether '*' is formatted like a unary operator/type
syntax or a binary operator.)
* - ST_COMPARE_ARG_TYPE - type of extra argument * + * To say that the comparator returns a type other than int, use: + * + * - ST_COMPARE_TYPE - an integer typeSince the ST_COMPARE_TYPE is meant to designate the type of the return value, maybe ST_COMPARE_RET_TYPE would be better name.
It also goes with ST_COMPARE_ARG_TYPE preceding this.
Good idea, will do.
- ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data, - *pa, - *pb, - *pc, - *pd, - *pl, - *pm, - *pn; + ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data; + ST_POINTER_TYPE *pa;There doesn't seem to be material change for the above hunk.
In master, you can't write #define ST_ELEMENT_TYPE some_type *, which
seems like it would be quite useful. You can use pointers as element
types, but only with a typedef name due to C parsing rules. some_type
**a, *pa, ... declares some_type *pa, but we want some_type **pa. I
don't want to have to introduce extra typedefs. The change fixes that
problem by not using C's squirrelly variable declaration list syntax.
+ while (left <= right) + { + size_t mid = (left + right) / 2;The computation for midpoint should be left + (right-left)/2.
Right, my way can overflow. Will fix. Thanks!
On Wed, Jun 16, 2021 at 2:54 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Hi Zhihong,
On Thu, Jun 17, 2021 at 8:13 AM Zhihong Yu <zyu@yugabyte.com> wrote:
In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :
- const ST_ELEMENT_TYPE *
ST_SORT_PROTO_ARG);
+ const ST_ELEMENT_TYPE
*ST_SORT_PROTO_ARG);
It seems there is no real change in the line above. Better keep the
original formation.
Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch. (That file is used by
pg_bsd_indent to decide if an identifier is a type or a variable,
which affects whether '*' is formatted like a unary operator/type
syntax or a binary operator.)* - ST_COMPARE_ARG_TYPE - type of extra argument * + * To say that the comparator returns a type other than int, use: + * + * - ST_COMPARE_TYPE - an integer typeSince the ST_COMPARE_TYPE is meant to designate the type of the return
value, maybe ST_COMPARE_RET_TYPE would be better name.
It also goes with ST_COMPARE_ARG_TYPE preceding this.
Good idea, will do.
- ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data, - *pa, - *pb, - *pc, - *pd, - *pl, - *pm, - *pn; + ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data; + ST_POINTER_TYPE *pa;There doesn't seem to be material change for the above hunk.
In master, you can't write #define ST_ELEMENT_TYPE some_type *, which
seems like it would be quite useful. You can use pointers as element
types, but only with a typedef name due to C parsing rules. some_type
**a, *pa, ... declares some_type *pa, but we want some_type **pa. I
don't want to have to introduce extra typedefs. The change fixes that
problem by not using C's squirrelly variable declaration list syntax.+ while (left <= right) + { + size_t mid = (left + right) / 2;The computation for midpoint should be left + (right-left)/2.
Right, my way can overflow. Will fix. Thanks!
Hi,
Thanks for giving me background on typedefs.
The relevant changes look fine to me.
Cheers
Thomas Munro <thomas.munro@gmail.com> writes:
Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch.
If ST_ELEMENT_TYPE isn't recognized as a typedef by the buildfarm's
typedef collectors, this sort of manual addition to typedefs.list
is not going to survive the next pgindent run. No, I will NOT
promise to manually add it back every time.
We do already have special provision for injecting additional typedefs
in the pgindent script, so one possibility is to add it there:
-my @additional = ("bool\n");
+my @additional = ("bool\nST_ELEMENT_TYPE\n");
On the whole I'm not sure that this is a big enough formatting
issue to justify a special hack, though. Is there any more than
the one line that gets misformatted?
regards, tom lane
On Thu, Jun 17, 2021 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch.If ST_ELEMENT_TYPE isn't recognized as a typedef by the buildfarm's
typedef collectors, this sort of manual addition to typedefs.list
is not going to survive the next pgindent run. No, I will NOT
promise to manually add it back every time.We do already have special provision for injecting additional typedefs
in the pgindent script, so one possibility is to add it there:-my @additional = ("bool\n"); +my @additional = ("bool\nST_ELEMENT_TYPE\n");On the whole I'm not sure that this is a big enough formatting
issue to justify a special hack, though. Is there any more than
the one line that gets misformatted?
Ohh. In that case, I won't bother with that hunk and will live with
the extra space. There are several other lines like this in the tree,
where people use caveman template macrology that is invisible to
whatever analyser is being used for that, and I can see that that's
just going to have to be OK for now. Perhaps one day we could add a
secondary file, not updated by that mechanism, that holds a manually
maintained list for cases like this.
Thomas Munro <thomas.munro@gmail.com> writes:
Perhaps one day we could add a
secondary file, not updated by that mechanism, that holds a manually
maintained list for cases like this.
Yeah, the comments in pgindent already speculate about that. For
now, those include and exclude lists are short enough that keeping
them inside the script seems a lot easier than building tooling
to get them from somewhere else.
The big problem in my mind, which would not be alleviated in the
slightest by having a separate file, is that it'd be easy to miss
removing entries if they ever become obsolete.
regards, tom lane
On Thu, Jun 17, 2021 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The big problem in my mind, which would not be alleviated in the
slightest by having a separate file, is that it'd be easy to miss
removing entries if they ever become obsolete.
I suppose you could invent some kind of declaration syntax in a
comment near the use of the pseudo-typename in the source tree that is
mechanically extracted.
On Wed, Jun 16, 2021 at 1:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
[v2 patch]
Hi Thomas,
I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see
if I can detect any significant differences.
I did a quick check of the MacOS/clang binary size (no debug symbols):
master: 8108408
0001-0009: 8125224
Later, I'll drill down into the individual patches and see if anything
stands out.
There were already some comments for v2 upthread about formatting and an
overflow hazard, but I did find a few more things to ask about:
- For my curiosity, there are a lot of calls to qsort/qunique in the tree
-- without having looked exhaustively, do these patches focus on cases
where there are bespoke comparator functions and/or hot code paths?
- Aside from the qsort{_arg} precedence, is there a practical reason for
keeping the new global functions in their own files?
- 0002 / 0004
+/* Search and unique functions inline in header. */
The functions are pretty small, but is there some advantage for inlining
these?
- 0003
#include "lib/qunique.h" is not needed anymore.
This isn't quite relevant for the current patch perhaps, but I'm wondering
why we don't already call bsearch for RelationHasSysCache() and
RelationSupportsSysCache().
- 0008
+#define ST_COMPARE(a, b, cxt) \
+ DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b))
This seems like a pretty heavyweight comparison, so I'm not sure inlining
buys us much, but it seems also there are fewer branches this way. I'll
come up with a test and see what happens.
--
John Naylor
EDB: http://www.enterprisedb.com
Hi John,
On Tue, Jun 29, 2021 at 7:13 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see if I can detect any significant differences.
Thanks!
I did a quick check of the MacOS/clang binary size (no debug symbols):
master: 8108408
0001-0009: 8125224
Not too bad.
Later, I'll drill down into the individual patches and see if anything stands out.
There were already some comments for v2 upthread about formatting and an overflow hazard, but I did find a few more things to ask about:
Right, here's an update with fixes discussed earlier with Zhihong and Tom:
* COMPARE_TYPE -> COMPARE_RET_TYPE
* quit fighting with pgindent (I will try to fix this problem generally later)
* fix overflow hazard
- For my curiosity, there are a lot of calls to qsort/qunique in the tree -- without having looked exhaustively, do these patches focus on cases where there are bespoke comparator functions and/or hot code paths?
Patches 0006-0009 are highly specialised for local usage by a single
module, and require some kind of evidence that they're worth their
bytes, and the onus is on me there of course -- but any ideas and
feedback are welcome. There are other opportunities like these, maybe
better ones. That reminds me: I recently had a perf report from
Andres that showed the qsort in compute_scalar_stats() as quite hot.
That's probably a good candidate, and is not yet done in the current
patch set.
The lower numbered patches are all things that are reused in many
places, and in my humble opinion improve the notation and type safety
and code deduplication generally when working with common types
ItemPtr, BlockNumber, Oid, aside from any performance arguments. At
least the ItemPtr stuff *might* also speed something useful up.
I tried to measure a speedup in vacuum, but so far I have not. I did
learn some things though: While doing that with an uncorrelated index
and a lot of deleted tuples, I found that adding more
maintenance_work_mem doesn't help beyond a few MB, because then cache
misses dominate to the point where it's not better than doing multiple
passes (and this is familiar to me from work on hash joins). If I
turned on huge pages on Linux and set min_dynamic_shared_memory so
that the parallel DSM used by vacuum lives in huge pages, then
parallel vacuum with a large maintenance_work_mem starts to do much
better than non-parallel vacuum by improving the TLB misses (as with
hash joins). I thought that was quite interesting! Perhaps
bsearch_itemptr might help with correlated indexes with a lot of
deleted indexes (so not dominated by cache misses), though?
(I wouldn't be suprised if someone comes up with a much better idea
than bsearch for that anyway... a few ideas have been suggested.)
- Aside from the qsort{_arg} precedence, is there a practical reason for keeping the new global functions in their own files?
Better idea for layout welcome. One thing I wondered while trying to
figure out where to put functions that operate on itemptr: why is
itemptr_encode() in src/include/catalog/index.h?!
- 0002 / 0004
+/* Search and unique functions inline in header. */
The functions are pretty small, but is there some advantage for inlining these?
Glibc's bsearch definition is already in a header for inlining (as is
our qunique), so I thought I should preserve that characteristic on
principle. I don't have any evidence though. Other libcs I looked at
didn't have bsearch in a header. So by doing this we make the
generated code the same across platforms (all other relevant things
being equal). I don't know if it really makes much difference,
especially since in this case the comparator and size would still be
inlined if we defined it in the .c (unlike standard bsearch)...
Probably only lazy_tid_reaped() calls it enough to potentially show
any difference in a non-microbenchmark workload, if anything does.
- 0003
#include "lib/qunique.h" is not needed anymore.
Fixed.
This isn't quite relevant for the current patch perhaps, but I'm wondering why we don't already call bsearch for RelationHasSysCache() and RelationSupportsSysCache().
Right, I missed that. Done. Nice to delete some more code.
- 0008
+#define ST_COMPARE(a, b, cxt) \ + DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b))This seems like a pretty heavyweight comparison, so I'm not sure inlining buys us much, but it seems also there are fewer branches this way. I'll come up with a test and see what happens.
I will be very interested to see the results. Thanks!