A qsort template

Started by Thomas Munroabout 5 years ago85 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hello,

In another thread[1]/messages/by-id/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com, I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.

[1]: /messages/by-id/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com

Attachments:

0001-Add-sort_template.h-for-making-fast-sort-functions.patchtext/x-patch; charset=US-ASCII; name=0001-Add-sort_template.h-for-making-fast-sort-functions.patchDownload+431-1
0002-Use-sort_template.h-for-qsort-and-qsort_arg.patchtext/x-patch; charset=US-ASCII; name=0002-Use-sort_template.h-for-qsort-and-qsort_arg.patchDownload+15-441
0003-Use-sort_template.h-for-qsort_tuple-and-qsort_ssup.patchtext/x-patch; charset=US-ASCII; name=0003-Use-sort_template.h-for-qsort_tuple-and-qsort_ssup.patchDownload+21-297
#2Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#1)
Re: A qsort template

Hi,

On 2021-02-18 16:09:49 +1300, Thomas Munro wrote:

In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.

One place that could benefit is the qsort that BufferSync() does at the
start. I tried your patch for that, and it does reduce the sort time
considerably. For 64GB of mostly dirty shared_buffers from ~1.4s to
0.6s.

Now, obviously one can argue that that's not going to be the crucial
spot, and wouldn't be entirely wrong. OTOH, in my AIO branch I see
checkpointer doing ~10GB/s, leading to the sort being a measurable
portion of the overall time.

Greetings,

Andres Freund

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Thomas Munro (#1)
Re: A qsort template

On 18 Feb 2021, at 04:09, Thomas Munro <thomas.munro@gmail.com> wrote:

In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.

Just to play around with this while reviewing I made a qsort_strcmp, like in
the attached, and tested it using a ~9M word [0]https://github.com/dwyl/english-words/ shuffled 20 times over randomly shuffled wordlist.
While being too small input to make any meaningful difference in runtime (it
shaved a hair off but it might well be within the error margin) there was no
regression either. More importantly, it was really simple and quick to make a
tailored qsort which is the intention with the patch. While still being a bit
of magic, moving from the Perl generator makes this slightly less magic IMO so
+1 on this approach.

A tiny nitpick on the patch itself:

+ *       - ST_COMPARE(a, b) - a simple comparison expression
+ *   - ST_COMPARE(a, b, arg) - variant that takes an extra argument
Indentation.

All tests pass and the documentation in the the sort_template.h is enough to go
on, but I would prefer to see a comment in port/qsort.c referring back to
sort_template.h for documentation.

--
Daniel Gustafsson https://vmware.com/

[0]: https://github.com/dwyl/english-words/ shuffled 20 times over

Attachments:

qsort_tmpl_strcmp-patchapplication/octet-stream; name=qsort_tmpl_strcmp-patch; x-unix-mode=0644Download+22-12
#4Thomas Munro
thomas.munro@gmail.com
In reply to: Daniel Gustafsson (#3)
Re: A qsort template

On Wed, Mar 3, 2021 at 10:25 AM Daniel Gustafsson <daniel@yesql.se> wrote:

On 18 Feb 2021, at 04:09, Thomas Munro <thomas.munro@gmail.com> wrote:
In another thread[1], I proposed $SUBJECT, but then we found a better
solution to that thread's specific problem. The general idea is still
good though: it's possible to (1) replace several existing copies of
our qsort algorithm with one, and (2) make new specialised versions a
bit more easily than the existing Perl generator allows. So, I'm back
with a rebased stack of patches. I'll leave specific cases for new
worthwhile specialisations for separate proposals; I've heard about
several.

Just to play around with this while reviewing I made a qsort_strcmp, like in
the attached, and tested it using a ~9M word [0] randomly shuffled wordlist.
While being too small input to make any meaningful difference in runtime (it
shaved a hair off but it might well be within the error margin) there was no
regression either. More importantly, it was really simple and quick to make a
tailored qsort which is the intention with the patch. While still being a bit
of magic, moving from the Perl generator makes this slightly less magic IMO so
+1 on this approach.

Thanks for testing and reviewing!

A tiny nitpick on the patch itself:

+ *       - ST_COMPARE(a, b) - a simple comparison expression
+ *   - ST_COMPARE(a, b, arg) - variant that takes an extra argument
Indentation.

Fixed. Also ran pgindent.

All tests pass and the documentation in the the sort_template.h is enough to go
on, but I would prefer to see a comment in port/qsort.c referring back to
sort_template.h for documentation.

I tried adding a comment along the lines "see lib/sort_template.h for
details", but it felt pretty redundant, when the file contains very
little other than #include "lib/sort_template.h" which should already
tell you to go and look there to find out what this is about...

I went ahead and pushed these.

I am sure there are plenty of opportunities to experiment with this
code. Here are some I recall Peter Geoghegan mentioning:

1. If you know that elements are unique, you could remove some
branches that deal with equal elements (see "r == 0").
2. Perhaps you might want to be able to disable the "presorted" check
in some cases?
3. The parameters 7, 7 and 40 were probably tuned for an ancient Vax
or similar[1]http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&amp;rep=rep1&amp;type=pdf. We see higher insertion sort thesholds such as 27 in
more recent sort algorithms[2]https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf used in eg the JVM. You could perhaps
speculate that the right answer depends in part on the element size; I
dunno, but if so, here we have that at compile time while traditional
qsort() does not.

As for which cases are actually worth specialising, I've attached the
example that Andres mentioned earlier; it seems like a reasonable
candidate to go ahead and commit too, but I realised that I'd
forgotten to attach it earlier.

It's possible that the existing support sorting tuples could be
further specialised for common sort key data types; I haven't tried
that.

[1]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&amp;rep=rep1&amp;type=pdf
[2]: https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf

Attachments:

0001-Specialize-checkpointer-sort-functions.patchtext/x-patch; charset=US-ASCII; name=0001-Specialize-checkpointer-sort-functions.patchDownload+22-16
#5Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#4)
Re: A qsort template

Hi,

I wish we had the same for bsearch... :)

On 2021-03-03 17:17:13 +1300, Thomas Munro wrote:

As for which cases are actually worth specialising, I've attached the
example that Andres mentioned earlier; it seems like a reasonable
candidate to go ahead and commit too, but I realised that I'd
forgotten to attach it earlier.

From 4cec5cb9a2e0c50726b7337fb8221281e155c4cd Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 18 Feb 2021 14:47:28 +1300
Subject: [PATCH] Specialize checkpointer sort functions.

When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine. One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Discussion: /messages/by-id/CA+hUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse+HG0Yig@mail.gmail.com

Looks good to me.

Greetings,

Andres Freund

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#5)
Re: A qsort template

On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de> wrote:

I wish we had the same for bsearch... :)

Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...

It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]/messages/by-id/CA+hUKGLY47Cvu62mFDT53Ya0P95cGggcBN6R6aLpx6=Gm5j+1A@mail.gmail.com), just to have nice type safe
code though, not sure, I'm thinking about that...

[1]: /messages/by-id/CA+hUKGLY47Cvu62mFDT53Ya0P95cGggcBN6R6aLpx6=Gm5j+1A@mail.gmail.com

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#6)
Re: A qsort template

On Sat, Mar 13, 2021 at 3:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de> wrote:

I wish we had the same for bsearch... :)

Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...

It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]), just to have nice type safe
code though, not sure, I'm thinking about that...

I remembered a very good reason to do this: the ability to do
branch-free comparators in more places by introducing optional wider
results. That's good for TIDs (needs 49 bits), and places that want
to "reverse" a traditional comparator (just doing -result on an int
comparator that might theoretically return INT_MIN requires at least
33 bits). So I rebased the relevant parts of my earlier version, and
went through and wrote a bunch of examples to demonstrate all this
stuff actually working.

There are two categories of change in these patches:

0002-0005: Places that sort/unique/search OIDs, BlockNumbers and TIDs,
which can reuse a small set of typed functions (a few more could be
added, if useful). See sortitemptr.h and sortscalar.h. Mostly this
is just a notational improvement, and an excuse to drop a bunch of
duplicated code. In a few places this might really speed something
important up! Like VACUUM's lazy_tid_reaped().

0006-0009. Places where a specialised function is generated for one
special purpose, such as ANALYZE's HeapTuple sort, tidbitmap.c's
pagetable sort, some places in nbtree code etc. These may require
some case-by-case research on whether the extra executable size is
worth the speedup, and there are surely more opportunities like that;
I just picked on these arbitrarily.

Attachments:

0001-Add-bsearch-and-unique-templates-to-sort_template.h.patchtext/x-patch; charset=US-ASCII; name=0001-Add-bsearch-and-unique-templates-to-sort_template.h.patchDownload+124-19
0002-Supply-sort-search-specializations-for-common-scalar.patchtext/x-patch; charset=US-ASCII; name=0002-Supply-sort-search-specializations-for-common-scalar.patchDownload+38-1
0003-Use-qsort_oid-and-friends-in-obvious-places.patchtext/x-patch; charset=US-ASCII; name=0003-Use-qsort_oid-and-friends-in-obvious-places.patchDownload+24-63
0004-Supply-specialized-sort-search-routines-for-ItemPtrD.patchtext/x-patch; charset=US-ASCII; name=0004-Supply-specialized-sort-search-routines-for-ItemPtrD.patchDownload+32-1
0005-Use-qsort_itemptr-and-friends-in-various-places.patchtext/x-patch; charset=US-ASCII; name=0005-Use-qsort_itemptr-and-friends-in-various-places.patchDownload+7-66
0006-Specialize-the-HeapTuple-sort-routine-for-ANALYZE.patchtext/x-patch; charset=US-ASCII; name=0006-Specialize-the-HeapTuple-sort-routine-for-ANALYZE.patchDownload+9-28
0007-Specialize-pagetable-sort-routines-in-tidbitmap.c.patchtext/x-patch; charset=US-ASCII; name=0007-Specialize-pagetable-sort-routines-in-tidbitmap.c.patchDownload+27-47
0008-Specialize-some-sort-search-routines-in-nbtree-code.patchtext/x-patch; charset=US-ASCII; name=0008-Specialize-some-sort-search-routines-in-nbtree-code.patchDownload+48-50
0009-Specialize-sort-routine-used-by-multixact.c.patchtext/x-patch; charset=US-ASCII; name=0009-Specialize-sort-routine-used-by-multixact.c.patchDownload+9-27
#8Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#7)
Re: A qsort template

Hi,

For 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

+ * Remove duplicates from an array.  Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,

The array is supposed to be sorted, right ?
The comment should mention this.

Cheers

On Sat, Mar 13, 2021 at 6:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Show quoted text

On Sat, Mar 13, 2021 at 3:49 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

On Fri, Mar 12, 2021 at 7:58 AM Andres Freund <andres@anarazel.de>

wrote:

I wish we had the same for bsearch... :)

Glibc already has the definition of the traditional void-based
function in /usr/include/bits/stdlib-bsearch.h, so the generated code
when the compiler can see the comparator definition is already good in
eg lazy_tid_reaped() and eg some nbtree search routines. We could
probably expose more trivial comparators in headers to get more of
that, and we could perhaps put our own bsearch definition in a header
for other platforms that didn't think of that...

It might be worth doing type-safe macro templates as well, though (as
I already did in an earlier proposal[1]), just to have nice type safe
code though, not sure, I'm thinking about that...

I remembered a very good reason to do this: the ability to do
branch-free comparators in more places by introducing optional wider
results. That's good for TIDs (needs 49 bits), and places that want
to "reverse" a traditional comparator (just doing -result on an int
comparator that might theoretically return INT_MIN requires at least
33 bits). So I rebased the relevant parts of my earlier version, and
went through and wrote a bunch of examples to demonstrate all this
stuff actually working.

There are two categories of change in these patches:

0002-0005: Places that sort/unique/search OIDs, BlockNumbers and TIDs,
which can reuse a small set of typed functions (a few more could be
added, if useful). See sortitemptr.h and sortscalar.h. Mostly this
is just a notational improvement, and an excuse to drop a bunch of
duplicated code. In a few places this might really speed something
important up! Like VACUUM's lazy_tid_reaped().

0006-0009. Places where a specialised function is generated for one
special purpose, such as ANALYZE's HeapTuple sort, tidbitmap.c's
pagetable sort, some places in nbtree code etc. These may require
some case-by-case research on whether the extra executable size is
worth the speedup, and there are surely more opportunities like that;
I just picked on these arbitrarily.

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Zhihong Yu (#8)
Re: A qsort template

On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

For 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

+ * Remove duplicates from an array.  Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,

The array is supposed to be sorted, right ?
The comment should mention this.

Good point, will update. Thanks!

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
Re: A qsort template

On Mon, Mar 15, 2021 at 1:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

+ * Remove duplicates from an array.  Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,

The array is supposed to be sorted, right ?
The comment should mention this.

Good point, will update. Thanks!

Rebased. Also fixed some formatting problems and updated
typedefs.list so they don't come back.

Attachments:

more-sort-search-specializations-v2.tgzapplication/x-compressed-tar; name=more-sort-search-specializations-v2.tgzDownload
#11Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#10)
Re: A qsort template

On Tue, Jun 15, 2021 at 10:55 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

On Mon, Mar 15, 2021 at 1:09 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

On Sun, Mar 14, 2021 at 5:03 PM Zhihong Yu <zyu@yugabyte.com> wrote:

+ * Remove duplicates from an array.  Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,

The array is supposed to be sorted, right ?
The comment should mention this.

Good point, will update. Thanks!

Rebased. Also fixed some formatting problems and updated
typedefs.list so they don't come back.

Hi,
In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

-                                       const ST_ELEMENT_TYPE *
ST_SORT_PROTO_ARG);
+                                       const ST_ELEMENT_TYPE
*ST_SORT_PROTO_ARG);

It seems there is no real change in the line above. Better keep the
original formation.

  *   - ST_COMPARE_ARG_TYPE - type of extra argument
  *
+ *   To say that the comparator returns a type other than int, use:
+ *
+ *       - ST_COMPARE_TYPE - an integer type

Since the ST_COMPARE_TYPE is meant to designate the type of the return
value, maybe ST_COMPARE_RET_TYPE would be better name.
It also goes with ST_COMPARE_ARG_TYPE preceding this.

-   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data,
-              *pa,
-              *pb,
-              *pc,
-              *pd,
-              *pl,
-              *pm,
-              *pn;
+   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data;
+   ST_POINTER_TYPE *pa;

There doesn't seem to be material change for the above hunk.

+   while (left <= right)
+   {
+       size_t      mid = (left + right) / 2;

The computation for midpoint should be left + (right-left)/2.

Cheers

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Zhihong Yu (#11)
Re: A qsort template

Hi Zhihong,

On Thu, Jun 17, 2021 at 8:13 AM Zhihong Yu <zyu@yugabyte.com> wrote:

In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

-                                       const ST_ELEMENT_TYPE * ST_SORT_PROTO_ARG);
+                                       const ST_ELEMENT_TYPE *ST_SORT_PROTO_ARG);

It seems there is no real change in the line above. Better keep the original formation.

Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch. (That file is used by
pg_bsd_indent to decide if an identifier is a type or a variable,
which affects whether '*' is formatted like a unary operator/type
syntax or a binary operator.)

*   - ST_COMPARE_ARG_TYPE - type of extra argument
*
+ *   To say that the comparator returns a type other than int, use:
+ *
+ *       - ST_COMPARE_TYPE - an integer type

Since the ST_COMPARE_TYPE is meant to designate the type of the return value, maybe ST_COMPARE_RET_TYPE would be better name.
It also goes with ST_COMPARE_ARG_TYPE preceding this.

Good idea, will do.

-   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data,
-              *pa,
-              *pb,
-              *pc,
-              *pd,
-              *pl,
-              *pm,
-              *pn;
+   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data;
+   ST_POINTER_TYPE *pa;

There doesn't seem to be material change for the above hunk.

In master, you can't write #define ST_ELEMENT_TYPE some_type *, which
seems like it would be quite useful. You can use pointers as element
types, but only with a typedef name due to C parsing rules. some_type
**a, *pa, ... declares some_type *pa, but we want some_type **pa. I
don't want to have to introduce extra typedefs. The change fixes that
problem by not using C's squirrelly variable declaration list syntax.

+   while (left <= right)
+   {
+       size_t      mid = (left + right) / 2;

The computation for midpoint should be left + (right-left)/2.

Right, my way can overflow. Will fix. Thanks!

#13Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#12)
Re: A qsort template

On Wed, Jun 16, 2021 at 2:54 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Hi Zhihong,

On Thu, Jun 17, 2021 at 8:13 AM Zhihong Yu <zyu@yugabyte.com> wrote:

In 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

- const ST_ELEMENT_TYPE *

ST_SORT_PROTO_ARG);

+ const ST_ELEMENT_TYPE

*ST_SORT_PROTO_ARG);

It seems there is no real change in the line above. Better keep the

original formation.

Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch. (That file is used by
pg_bsd_indent to decide if an identifier is a type or a variable,
which affects whether '*' is formatted like a unary operator/type
syntax or a binary operator.)

*   - ST_COMPARE_ARG_TYPE - type of extra argument
*
+ *   To say that the comparator returns a type other than int, use:
+ *
+ *       - ST_COMPARE_TYPE - an integer type

Since the ST_COMPARE_TYPE is meant to designate the type of the return

value, maybe ST_COMPARE_RET_TYPE would be better name.

It also goes with ST_COMPARE_ARG_TYPE preceding this.

Good idea, will do.

-   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data,
-              *pa,
-              *pb,
-              *pc,
-              *pd,
-              *pl,
-              *pm,
-              *pn;
+   ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data;
+   ST_POINTER_TYPE *pa;

There doesn't seem to be material change for the above hunk.

In master, you can't write #define ST_ELEMENT_TYPE some_type *, which
seems like it would be quite useful. You can use pointers as element
types, but only with a typedef name due to C parsing rules. some_type
**a, *pa, ... declares some_type *pa, but we want some_type **pa. I
don't want to have to introduce extra typedefs. The change fixes that
problem by not using C's squirrelly variable declaration list syntax.

+   while (left <= right)
+   {
+       size_t      mid = (left + right) / 2;

The computation for midpoint should be left + (right-left)/2.

Right, my way can overflow. Will fix. Thanks!

Hi,
Thanks for giving me background on typedefs.
The relevant changes look fine to me.

Cheers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#12)
Re: A qsort template

Thomas Munro <thomas.munro@gmail.com> writes:

Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch.

If ST_ELEMENT_TYPE isn't recognized as a typedef by the buildfarm's
typedef collectors, this sort of manual addition to typedefs.list
is not going to survive the next pgindent run. No, I will NOT
promise to manually add it back every time.

We do already have special provision for injecting additional typedefs
in the pgindent script, so one possibility is to add it there:

-my @additional = ("bool\n");
+my @additional = ("bool\nST_ELEMENT_TYPE\n");

On the whole I'm not sure that this is a big enough formatting
issue to justify a special hack, though. Is there any more than
the one line that gets misformatted?

regards, tom lane

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#14)
Re: A qsort template

On Thu, Jun 17, 2021 at 11:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Hmm, well it was only recently damaged by commit def5b065, and that's
because I'd forgotten to put ST_ELEMENT_TYPE into typedefs.list, and I
was correcting that in this patch.

If ST_ELEMENT_TYPE isn't recognized as a typedef by the buildfarm's
typedef collectors, this sort of manual addition to typedefs.list
is not going to survive the next pgindent run. No, I will NOT
promise to manually add it back every time.

We do already have special provision for injecting additional typedefs
in the pgindent script, so one possibility is to add it there:

-my @additional = ("bool\n");
+my @additional = ("bool\nST_ELEMENT_TYPE\n");

On the whole I'm not sure that this is a big enough formatting
issue to justify a special hack, though. Is there any more than
the one line that gets misformatted?

Ohh. In that case, I won't bother with that hunk and will live with
the extra space. There are several other lines like this in the tree,
where people use caveman template macrology that is invisible to
whatever analyser is being used for that, and I can see that that's
just going to have to be OK for now. Perhaps one day we could add a
secondary file, not updated by that mechanism, that holds a manually
maintained list for cases like this.

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#15)
Re: A qsort template

Thomas Munro <thomas.munro@gmail.com> writes:

Perhaps one day we could add a
secondary file, not updated by that mechanism, that holds a manually
maintained list for cases like this.

Yeah, the comments in pgindent already speculate about that. For
now, those include and exclude lists are short enough that keeping
them inside the script seems a lot easier than building tooling
to get them from somewhere else.

The big problem in my mind, which would not be alleviated in the
slightest by having a separate file, is that it'd be easy to miss
removing entries if they ever become obsolete.

regards, tom lane

#17Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#16)
Re: A qsort template

On Thu, Jun 17, 2021 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The big problem in my mind, which would not be alleviated in the
slightest by having a separate file, is that it'd be easy to miss
removing entries if they ever become obsolete.

I suppose you could invent some kind of declaration syntax in a
comment near the use of the pseudo-typename in the source tree that is
mechanically extracted.

#18John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#10)
Re: A qsort template

On Wed, Jun 16, 2021 at 1:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
[v2 patch]

Hi Thomas,

I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see
if I can detect any significant differences.

I did a quick check of the MacOS/clang binary size (no debug symbols):

master: 8108408
0001-0009: 8125224

Later, I'll drill down into the individual patches and see if anything
stands out.

There were already some comments for v2 upthread about formatting and an
overflow hazard, but I did find a few more things to ask about:

- For my curiosity, there are a lot of calls to qsort/qunique in the tree
-- without having looked exhaustively, do these patches focus on cases
where there are bespoke comparator functions and/or hot code paths?

- Aside from the qsort{_arg} precedence, is there a practical reason for
keeping the new global functions in their own files?

- 0002 / 0004

+/* Search and unique functions inline in header. */

The functions are pretty small, but is there some advantage for inlining
these?

- 0003

#include "lib/qunique.h" is not needed anymore.

This isn't quite relevant for the current patch perhaps, but I'm wondering
why we don't already call bsearch for RelationHasSysCache() and
RelationSupportsSysCache().

- 0008

+#define ST_COMPARE(a, b, cxt) \
+ DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b))

This seems like a pretty heavyweight comparison, so I'm not sure inlining
buys us much, but it seems also there are fewer branches this way. I'll
come up with a test and see what happens.

--
John Naylor
EDB: http://www.enterprisedb.com

#19Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#18)
Re: A qsort template

Hi John,

On Tue, Jun 29, 2021 at 7:13 AM John Naylor
<john.naylor@enterprisedb.com> wrote:

I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see if I can detect any significant differences.

Thanks!

I did a quick check of the MacOS/clang binary size (no debug symbols):

master: 8108408
0001-0009: 8125224

Not too bad.

Later, I'll drill down into the individual patches and see if anything stands out.

There were already some comments for v2 upthread about formatting and an overflow hazard, but I did find a few more things to ask about:

Right, here's an update with fixes discussed earlier with Zhihong and Tom:

* COMPARE_TYPE -> COMPARE_RET_TYPE
* quit fighting with pgindent (I will try to fix this problem generally later)
* fix overflow hazard

- For my curiosity, there are a lot of calls to qsort/qunique in the tree -- without having looked exhaustively, do these patches focus on cases where there are bespoke comparator functions and/or hot code paths?

Patches 0006-0009 are highly specialised for local usage by a single
module, and require some kind of evidence that they're worth their
bytes, and the onus is on me there of course -- but any ideas and
feedback are welcome. There are other opportunities like these, maybe
better ones. That reminds me: I recently had a perf report from
Andres that showed the qsort in compute_scalar_stats() as quite hot.
That's probably a good candidate, and is not yet done in the current
patch set.

The lower numbered patches are all things that are reused in many
places, and in my humble opinion improve the notation and type safety
and code deduplication generally when working with common types
ItemPtr, BlockNumber, Oid, aside from any performance arguments. At
least the ItemPtr stuff *might* also speed something useful up.

I tried to measure a speedup in vacuum, but so far I have not. I did
learn some things though: While doing that with an uncorrelated index
and a lot of deleted tuples, I found that adding more
maintenance_work_mem doesn't help beyond a few MB, because then cache
misses dominate to the point where it's not better than doing multiple
passes (and this is familiar to me from work on hash joins). If I
turned on huge pages on Linux and set min_dynamic_shared_memory so
that the parallel DSM used by vacuum lives in huge pages, then
parallel vacuum with a large maintenance_work_mem starts to do much
better than non-parallel vacuum by improving the TLB misses (as with
hash joins). I thought that was quite interesting! Perhaps
bsearch_itemptr might help with correlated indexes with a lot of
deleted indexes (so not dominated by cache misses), though?

(I wouldn't be suprised if someone comes up with a much better idea
than bsearch for that anyway... a few ideas have been suggested.)

- Aside from the qsort{_arg} precedence, is there a practical reason for keeping the new global functions in their own files?

Better idea for layout welcome. One thing I wondered while trying to
figure out where to put functions that operate on itemptr: why is
itemptr_encode() in src/include/catalog/index.h?!

- 0002 / 0004

+/* Search and unique functions inline in header. */

The functions are pretty small, but is there some advantage for inlining these?

Glibc's bsearch definition is already in a header for inlining (as is
our qunique), so I thought I should preserve that characteristic on
principle. I don't have any evidence though. Other libcs I looked at
didn't have bsearch in a header. So by doing this we make the
generated code the same across platforms (all other relevant things
being equal). I don't know if it really makes much difference,
especially since in this case the comparator and size would still be
inlined if we defined it in the .c (unlike standard bsearch)...
Probably only lazy_tid_reaped() calls it enough to potentially show
any difference in a non-microbenchmark workload, if anything does.

- 0003

#include "lib/qunique.h" is not needed anymore.

Fixed.

This isn't quite relevant for the current patch perhaps, but I'm wondering why we don't already call bsearch for RelationHasSysCache() and RelationSupportsSysCache().

Right, I missed that. Done. Nice to delete some more code.

- 0008

+#define ST_COMPARE(a, b, cxt) \
+ DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b))

This seems like a pretty heavyweight comparison, so I'm not sure inlining buys us much, but it seems also there are fewer branches this way. I'll come up with a test and see what happens.

I will be very interested to see the results. Thanks!

Attachments:

more-sort-search-specializations-v3.tgzapplication/x-compressed-tar; name=more-sort-search-specializations-v3.tgzDownload+2-0
#20Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#19)
Re: A qsort template

I spotted a mistake in v3: I didn't rename ST_COMPARE_TYPE to
ST_COMPARE_RET_TYPE in the 0009 patch (well, I did, but forgot to
commit before I ran git format-patch). I won't send another tarball
just for that, but will correct it next time.

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#20)
#22John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#21)
#23John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#22)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#23)
#25John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#25)
#27vignesh C
vignesh21@gmail.com
In reply to: Thomas Munro (#26)
#28John Naylor
john.naylor@enterprisedb.com
In reply to: vignesh C (#27)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#17)
#30John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#26)
In reply to: John Naylor (#30)
#32Ranier Vilela
ranier.vf@gmail.com
In reply to: John Naylor (#30)
#33John Naylor
john.naylor@enterprisedb.com
In reply to: Ranier Vilela (#32)
#34Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#31)
#35Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#30)
#36Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#35)
In reply to: Thomas Munro (#35)
#38John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#19)
#39John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#38)
In reply to: John Naylor (#39)
#41John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Geoghegan (#40)
#42John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#41)
#43John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#42)
#44John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#43)
#45John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#44)
#46Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#45)
#47John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#46)
#48John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#45)
#49Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#47)
#50John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#49)
#51John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#49)
#52Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#51)
#53Justin Pryzby
pryzby@telsasoft.com
In reply to: John Naylor (#51)
#54Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#52)
#55Thomas Munro
thomas.munro@gmail.com
In reply to: Justin Pryzby (#53)
#56Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#53)
#57Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#56)
#58Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#57)
#59Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#58)
#60Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#59)
#61Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#60)
#62John Naylor
john.naylor@enterprisedb.com
In reply to: Thomas Munro (#61)
#63Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#61)
In reply to: Thomas Munro (#63)
#65David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#63)
#66David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#63)
#67John Naylor
john.naylor@enterprisedb.com
In reply to: Peter Geoghegan (#64)
#68David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#67)
#69David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#63)
#70John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#69)
#71David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#70)
#72John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#71)
#73John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#69)
#74David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#73)
#75John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#74)
#76John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#75)
#77John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#75)
#78David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#77)
#79John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#78)
#80Thomas Munro
thomas.munro@gmail.com
In reply to: John Naylor (#79)
#81Justin Pryzby
pryzby@telsasoft.com
In reply to: John Naylor (#79)
In reply to: Justin Pryzby (#81)
#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#82)
#84John Naylor
john.naylor@enterprisedb.com
In reply to: Tom Lane (#83)
#85John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#84)