Sort functions with specialized comparators

Started by Andrey Borodinalmost 2 years ago38 messageshackers
Jump to latest
#1Andrey Borodin
amborodin@acm.org

Hi!

In a thread about sorting comparators[0]/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de Andres noted that we have infrastructure to help compiler optimize sorting. PFA attached PoC implementation. I've checked that it indeed works on the benchmark from that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid etc. But this sorting routines never show up in perf top or something like that.

Seems like in most cases we do not spend much time in sorting. But specialization does not cost us much too, only some CPU cycles of a compiler. I think we can further improve speedup by converting inline comparator to value extractor: more compilers will see what is actually going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]: /messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Attachments:

v1-0001-Use-specialized-sort-facilities.patchapplication/octet-stream; name=v1-0001-Use-specialized-sort-facilities.patch; x-unix-mode=0644Download+47-36
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Andrey Borodin (#1)
Re: Sort functions with specialized comparators

Em sáb., 18 de mai. de 2024 às 15:52, Andrey M. Borodin <
x4mmm@yandex-team.ru> escreveu:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid
etc. But this sorting routines never show up in perf top or something like
that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Makes sense.

Regarding the patch.
You could change the style to:

+sort_int32_asc_cmp(const int32 *a, const int32 *b)
+sort_int32_desc_cmp(const int32 *a, const int32 *b)

We must use const in all parameters that can be const.

best regards,
Ranier Vilela

#3Stepan Neretin
sncfmgg@gmail.com
In reply to: Andrey Borodin (#1)
Re: Sort functions with specialized comparators

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would introduce
similar optimizations when working with a list of integers or object
identifiers. Additionally, my patch includes an extension for benchmarking,
which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Show quoted text

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid
etc. But this sorting routines never show up in perf top or something like
that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Attachments:

v42-0006-Implemented-benchmarking-for-optimized-sorting.patchtext/x-patch; charset=US-ASCII; name=v42-0006-Implemented-benchmarking-for-optimized-sorting.patchDownload+134-1
v42-0002-Optimized-Oid-List-Sorting-by-using-template-sor.patchtext/x-patch; charset=US-ASCII; name=v42-0002-Optimized-Oid-List-Sorting-by-using-template-sor.patchDownload+15-4
v42-0004-Optimized-Int-List-Sorting-by-using-template-sor.patchtext/x-patch; charset=US-ASCII; name=v42-0004-Optimized-Int-List-Sorting-by-using-template-sor.patchDownload+15-4
v42-0005-Enhanced-Sorting-Efficiency-for-Integer-Lists.patchtext/x-patch; charset=US-ASCII; name=v42-0005-Enhanced-Sorting-Efficiency-for-Integer-Lists.patchDownload+1-6
v42-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchtext/x-patch; charset=US-ASCII; name=v42-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchDownload+4-8
v42-0001-Use-specialized-sort-facilities.patchtext/x-patch; charset=US-ASCII; name=v42-0001-Use-specialized-sort-facilities.patchDownload+47-36
#4Stepan Neretin
sncfmgg@gmail.com
In reply to: Stepan Neretin (#3)
Re: Sort functions with specialized comparators

On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would introduce
similar optimizations when working with a list of integers or object
identifiers. Additionally, my patch includes an extension for benchmarking,
which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber,
Oid etc. But this sorting routines never show up in perf top or something
like that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Hello all.

I have decided to explore more areas in which I can optimize and have added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#

Best regards, Stepan Neretin.

Attachments:

v42-0008-Refactor-sorting-of-attribute-numbers-in-pg_publ.patchtext/x-patch; charset=US-ASCII; name=v42-0008-Refactor-sorting-of-attribute-numbers-in-pg_publ.patchDownload+2-5
v42-0009-Introduce-benchmarking-function-for-int16-array-.patchtext/x-patch; charset=US-ASCII; name=v42-0009-Introduce-benchmarking-function-for-int16-array-.patchDownload+57-5
v42-0010-Implement-Sorting-Template-for-float8-Arrays.patchtext/x-patch; charset=US-ASCII; name=v42-0010-Implement-Sorting-Template-for-float8-Arrays.patchDownload+11-1
v42-0012-Add-benchmark-comparing-float8-sorting-methods.patchtext/x-patch; charset=US-ASCII; name=v42-0012-Add-benchmark-comparing-float8-sorting-methods.patchDownload+72-1
v42-0011-Optimize-box-quad-picksplit-with-float8-array-so.patchtext/x-patch; charset=US-ASCII; name=v42-0011-Optimize-box-quad-picksplit-with-float8-array-so.patchDownload+4-5
v42-0005-Enhanced-Sorting-Efficiency-for-Integer-Lists.patchtext/x-patch; charset=US-ASCII; name=v42-0005-Enhanced-Sorting-Efficiency-for-Integer-Lists.patchDownload+1-6
v42-0007-Consolidate-and-optimize-int16-array-sorting.patchtext/x-patch; charset=US-ASCII; name=v42-0007-Consolidate-and-optimize-int16-array-sorting.patchDownload+34-27
v42-0004-Optimized-Int-List-Sorting-by-using-template-sor.patchtext/x-patch; charset=US-ASCII; name=v42-0004-Optimized-Int-List-Sorting-by-using-template-sor.patchDownload+15-4
v42-0006-Implemented-benchmarking-for-optimized-sorting.patchtext/x-patch; charset=US-ASCII; name=v42-0006-Implemented-benchmarking-for-optimized-sorting.patchDownload+134-1
v42-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchtext/x-patch; charset=US-ASCII; name=v42-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchDownload+4-8
v42-0001-Use-specialized-sort-facilities.patchtext/x-patch; charset=US-ASCII; name=v42-0001-Use-specialized-sort-facilities.patchDownload+47-36
v42-0002-Optimized-Oid-List-Sorting-by-using-template-sor.patchtext/x-patch; charset=US-ASCII; name=v42-0002-Optimized-Oid-List-Sorting-by-using-template-sor.patchDownload+15-4
#5Антуан Виолин
violin.antuan@gmail.com
In reply to: Stepan Neretin (#4)
Re: Sort functions with specialized comparators

Hello all.

I have decided to explore more areas in which I can optimize and have added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

Hello all
We would like to see the relationship between the length of the sorted array
and the performance gain, perhaps some graphs. We also want to see to set a
"worst case" test, sorting the array in ascending order when it is initially
descending

Best, regards, Antoine Violin

postgres=#

On Mon, Jul 15, 2024 at 10:32 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would introduce
similar optimizations when working with a list of integers or object
identifiers. Additionally, my patch includes an extension for
benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber,
Oid etc. But this sorting routines never show up in perf top or something
like that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Hello all.

I have decided to explore more areas in which I can optimize and have
added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#

Best regards, Stepan Neretin.

#6Stepan Neretin
sncfmgg@gmail.com
In reply to: Антуан Виолин (#5)
Re: Sort functions with specialized comparators

On Mon, Jul 15, 2024 at 12:23 PM Антуан Виолин <violin.antuan@gmail.com>
wrote:

Hello all.

I have decided to explore more areas in which I can optimize and have
added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

Hello all
We would like to see the relationship between the length of the sorted
array and the performance gain, perhaps some graphs. We also want to see
to set a "worst case" test, sorting the array in ascending order when it
is initially descending

Best, regards, Antoine Violin

postgres=#

On Mon, Jul 15, 2024 at 10:32 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would introduce
similar optimizations when working with a list of integers or object
identifiers. Additionally, my patch includes an extension for
benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber,
Oid etc. But this sorting routines never show up in perf top or something
like that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Hello all.

I have decided to explore more areas in which I can optimize and have
added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#

Best regards, Stepan Neretin.

I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Best regards, Stepan Neretin.

#7Stepan Neretin
sncfmgg@gmail.com
In reply to: Stepan Neretin (#6)
Re: Sort functions with specialized comparators

On Mon, Jul 15, 2024 at 4:52 PM Stepan Neretin <sncfmgg@gmail.com> wrote:

On Mon, Jul 15, 2024 at 12:23 PM Антуан Виолин <violin.antuan@gmail.com>
wrote:

Hello all.

I have decided to explore more areas in which I can optimize and have
added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

Hello all
We would like to see the relationship between the length of the sorted
array and the performance gain, perhaps some graphs. We also want to see
to set a "worst case" test, sorting the array in ascending order when it
is initially descending

Best, regards, Antoine Violin

postgres=#

On Mon, Jul 15, 2024 at 10:32 AM Stepan Neretin <sncfmgg@gmail.com>
wrote:

On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com> wrote:

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would introduce
similar optimizations when working with a list of integers or object
identifiers. Additionally, my patch includes an extension for
benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber,
Oid etc. But this sorting routines never show up in perf top or something
like that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Hello all.

I have decided to explore more areas in which I can optimize and have
added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#

Best regards, Stepan Neretin.

I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Best regards, Stepan Neretin.

I implement reverse benchmarks:

postgres=# SELECT bench_oid_reverse_sort(1000);
bench_oid_reverse_sort

----------------------------------------------------------------------------------------------------------
Time taken by list_sort: 182557 ns, Time taken by list_oid_sort: 85864 ns,
Percentage difference: 52.97%
(1 row)

Time: 2,291 ms
postgres=# SELECT bench_oid_reverse_sort(100000);
bench_oid_reverse_sort

-------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 9064163 ns, Time taken by list_oid_sort: 4313448
ns, Percentage difference: 52.41%
(1 row)

Time: 17,146 ms
postgres=# SELECT bench_oid_reverse_sort(1000000);
bench_oid_reverse_sort

---------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 61990395 ns, Time taken by list_oid_sort:
23703380 ns, Percentage difference: 61.76%
(1 row)

postgres=# SELECT bench_int_reverse_sort(1000000);
bench_int_reverse_sort

---------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 50712416 ns, Time taken by list_int_sort:
24120417 ns, Percentage difference: 52.44%
(1 row)

Time: 89,359 ms

postgres=# SELECT bench_float8_reverse_sort(1000000);
bench_float8_reverse_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 57447775 ns, Time taken by optimized sort:
25214023 ns, Percentage difference: 56.11%
(1 row)

Time: 92,308 ms

Hello again. I want to show you the graphs of when we increase the length
vector/array sorting time (ns). What do you think about graphs?

Best regards, Stepan Neretin.

Attachments:

int_sort_bench.pngimage/png; name=int_sort_bench.pngDownload
int_sort_bench2.pngimage/png; name=int_sort_bench2.pngDownload
#8Stepan Neretin
sncfmgg@gmail.com
In reply to: Stepan Neretin (#7)
Re: Sort functions with specialized comparators

On Mon, Jul 15, 2024 at 5:47 PM Stepan Neretin <sncfmgg@gmail.com> wrote:

Show quoted text

On Mon, Jul 15, 2024 at 4:52 PM Stepan Neretin <sncfmgg@gmail.com> wrote:

On Mon, Jul 15, 2024 at 12:23 PM Антуан Виолин <violin.antuan@gmail.com>
wrote:

Hello all.

I have decided to explore more areas in which I can optimize and have
added
two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

Hello all
We would like to see the relationship between the length of the sorted
array and the performance gain, perhaps some graphs. We also want to see
to set a "worst case" test, sorting the array in ascending order when it
is initially descending

Best, regards, Antoine Violin

postgres=#

On Mon, Jul 15, 2024 at 10:32 AM Stepan Neretin <sncfmgg@gmail.com>
wrote:

On Sat, Jun 8, 2024 at 1:50 AM Stepan Neretin <sncfmgg@gmail.com>
wrote:

Hello all.

I am interested in the proposed patch and would like to propose some
additional changes that would complement it. My changes would
introduce similar optimizations when working with a list of integers
or object identifiers. Additionally, my patch includes an extension
for benchmarking, which shows an average speedup of 30-40%.

postgres=# SELECT bench_oid_sort(1000000);
bench_oid_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 116990848 ns, Time taken by list_oid_sort:
80446640 ns, Percentage difference: 31.24%
(1 row)

postgres=# SELECT bench_int_sort(1000000);
bench_int_sort

----------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 118168506 ns, Time taken by list_int_sort:
80523373 ns, Percentage difference: 31.86%
(1 row)

What do you think about these changes?

Best regards, Stepan Neretin.

On Fri, Jun 7, 2024 at 11:08 PM Andrey M. Borodin <
x4mmm@yandex-team.ru> wrote:

Hi!

In a thread about sorting comparators[0] Andres noted that we have
infrastructure to help compiler optimize sorting. PFA attached PoC
implementation. I've checked that it indeed works on the benchmark from
that thread.

postgres=# CREATE TABLE arrays_to_sort AS
SELECT array_shuffle(a) arr
FROM
(SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber,
OffsetNumber, Oid etc. But this sorting routines never show up in perf top
or something like that.

Seems like in most cases we do not spend much time in sorting. But
specialization does not cost us much too, only some CPU cycles of a
compiler. I think we can further improve speedup by converting inline
comparator to value extractor: more compilers will see what is actually
going on. But I have no proofs for this reasoning.

What do you think?

Best regards, Andrey Borodin.

[0]
/messages/by-id/20240209184014.sobshkcsfjix6u4r@awork3.anarazel.de

Hello all.

I have decided to explore more areas in which I can optimize and have
added two new benchmarks. Do you have any thoughts on this?

postgres=# select bench_int16_sort(1000000);
bench_int16_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 66354981 ns, Time taken by optimized sort:
52151523 ns, Percentage difference: 21.41%
(1 row)

postgres=# select bench_float8_sort(1000000);
bench_float8_sort

------------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 121475231 ns, Time taken by optimized sort:
74458545 ns, Percentage difference: 38.70%
(1 row)

postgres=#

Best regards, Stepan Neretin.

I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Best regards, Stepan Neretin.

I implement reverse benchmarks:

postgres=# SELECT bench_oid_reverse_sort(1000);
bench_oid_reverse_sort

----------------------------------------------------------------------------------------------------------
Time taken by list_sort: 182557 ns, Time taken by list_oid_sort: 85864
ns, Percentage difference: 52.97%
(1 row)

Time: 2,291 ms
postgres=# SELECT bench_oid_reverse_sort(100000);
bench_oid_reverse_sort

-------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 9064163 ns, Time taken by list_oid_sort: 4313448
ns, Percentage difference: 52.41%
(1 row)

Time: 17,146 ms
postgres=# SELECT bench_oid_reverse_sort(1000000);
bench_oid_reverse_sort

---------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 61990395 ns, Time taken by list_oid_sort:
23703380 ns, Percentage difference: 61.76%
(1 row)

postgres=# SELECT bench_int_reverse_sort(1000000);
bench_int_reverse_sort

---------------------------------------------------------------------------------------------------------------
Time taken by list_sort: 50712416 ns, Time taken by list_int_sort:
24120417 ns, Percentage difference: 52.44%
(1 row)

Time: 89,359 ms

postgres=# SELECT bench_float8_reverse_sort(1000000);
bench_float8_reverse_sort

-----------------------------------------------------------------------------------------------------------------
Time taken by usual sort: 57447775 ns, Time taken by optimized sort:
25214023 ns, Percentage difference: 56.11%
(1 row)

Time: 92,308 ms

Hello again. I want to show you the graphs of when we increase the length
vector/array sorting time (ns). What do you think about graphs?

Best regards, Stepan Neretin.

Hello again :) I made a mistake in the benchmarks code. I am attaching new
corrected benchmarks for int sorting as example. And my stupid, simple
python script for making benchs and draw graphs. What do you think about
this graphs?

Best regards, Stepan Neretin.

Attachments:

int_sort_bench3.pngimage/png; name=int_sort_bench3.pngDownload
int_sort_bench.pngimage/png; name=int_sort_bench.pngDownload
int_sort_bench2.pngimage/png; name=int_sort_bench2.pngDownload
int_sort_bench5.pngimage/png; name=int_sort_bench5.pngDownload
int_sort_bench4.pngimage/png; name=int_sort_bench4.pngDownload
int_sort_bench6.pngimage/png; name=int_sort_bench6.pngDownload
main.pytext/x-python; charset=US-ASCII; name=main.pyDownload
#9Andrey Borodin
amborodin@acm.org
In reply to: Stepan Neretin (#6)
Re: Sort functions with specialized comparators

On 15 Jul 2024, at 12:52, Stepan Neretin <sncfmgg@gmail.com> wrote:

I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Hi Stepan!

Thank you for implementing specialized sorting and doing this benchmarks.
I believe it's a possible direction for good improvement.
However, I doubt in correctness of your benchmarks.
Increasing TPC-B performance from 5776 TPS to 6214 TPS seems too good to be true.

Best regards, Andrey Borodin.

#10Stepan Neretin
sncfmgg@gmail.com
In reply to: Andrey Borodin (#9)
Re: Sort functions with specialized comparators

On Tue, Jul 16, 2024 at 1:47 AM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

On 15 Jul 2024, at 12:52, Stepan Neretin <sncfmgg@gmail.com> wrote:

I run benchmark with my patches:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.609 ms
initial connection time = 24.080 ms
tps = 6214.244789 (without initial connection time)

and without:
./pgbench -c 10 -j2 -t1000 -d postgres

pgbench (18devel)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 1.731 ms
initial connection time = 15.177 ms
tps = 5776.173285 (without initial connection time)

tps with my patches increase. What do you think?

Hi Stepan!

Thank you for implementing specialized sorting and doing this benchmarks.
I believe it's a possible direction for good improvement.
However, I doubt in correctness of your benchmarks.
Increasing TPC-B performance from 5776 TPS to 6214 TPS seems too good to
be true.

Best regards, Andrey Borodin.

Yes... I agree.. Very strange.. I restarted the tps measurement and see
this:

tps = 14291.893460 (without initial connection time) not patched
tps = 14669.624075 (without initial connection time) patched

What do you think about these measurements?
Best regards, Stepan Neretin.

#11Stepan Neretin
sndcppg@gmail.com
In reply to: Stepan Neretin (#10)
Re: Sort functions with specialized comparators

Hi! I rebase, clean and some refactor my patches.

Best regards, Stepan Neretin.

Attachments:

v2-0001-Use-specialized-sort-facilities.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Use-specialized-sort-facilities.patchDownload+47-36
v2-0004-Optimized-Integer-List-Sorting-by-Using-Template-.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Optimized-Integer-List-Sorting-by-Using-Template-.patchDownload+15-4
v2-0005-Refactor-Grouping-Sets-Sorting-to-Use-list_int_so.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Refactor-Grouping-Sets-Sorting-to-Use-list_int_so.patchDownload+1-2
v2-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Enhanced-Sorting-Efficiency-for-Oid-Lists.patchDownload+4-7
v2-0002-Optimized-Oid-List-Sorting-by-using-template-sort.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Optimized-Oid-List-Sorting-by-using-template-sort.patchDownload+15-4
v2-0006-Optimize-int16-Array-Sorting-in-CreateStatistics.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Optimize-int16-Array-Sorting-in-CreateStatistics.patchDownload+8-5
v2-0007-Introduce-a-sorting-template-for-float8-arrays-in.patchtext/x-patch; charset=US-ASCII; name=v2-0007-Introduce-a-sorting-template-for-float8-arrays-in.patchDownload+11-1
v2-0009-Add-extenstion-to-bench-perfomance-improvements.patchtext/x-patch; charset=US-ASCII; name=v2-0009-Add-extenstion-to-bench-perfomance-improvements.patchDownload+262-1
v2-0008-Replace-qsort-calls-with-the-new-sorting-template.patchtext/x-patch; charset=US-ASCII; name=v2-0008-Replace-qsort-calls-with-the-new-sorting-template.patchDownload+4-5
v2-0010-Refactor-LSN-Sorting-to-Use-Template-Based-sort_c.patchtext/x-patch; charset=US-ASCII; name=v2-0010-Refactor-LSN-Sorting-to-Use-Template-Based-sort_c.patchDownload+13-17
#12David Rowley
dgrowleyml@gmail.com
In reply to: Stepan Neretin (#11)
Re: Sort functions with specialized comparators

On Sun, 8 Sept 2024 at 20:51, Stepan Neretin <sndcppg@gmail.com> wrote:

Hi! I rebase, clean and some refactor my patches.

I'm unsure what exactly is going on with this thread. It started with
Andrey proposing a patch to speed up intarray sorting and now it's
turned into you proposing 10 patches which implement a series of sort
specialisation functions without any justification as to why the
change is useful.

If you want to have a performance patch accepted, then you'll need to
show your test case and the performance results before and after.

What this patch series looks like to me is that you've just searched
the code base for qsort and just implemented a specialised qsort
version without any regard as to whether the change is useful or not.
For example, looking at v2-0006, you've added a specialisation to sort
the columns which are specified in the CREATE STATISTICS command. This
seems highly unlikely to be useful. The number of elements in this
array is limited by STATS_MAX_DIMENSIONS, which is 8. Are you claiming
the sort specialisation you've added makes a meaningful performance
improvement to sorting an 8 element array?

It looks to me like you've just derailed Andrey's proposal. I suggest
you validate which ones of these patches you can demonstrate produce a
meaningful performance improvement, ditch the remainder, and then
start your own thread showing your test case and results.

David

#13Stepan Neretin
sndcppg@gmail.com
In reply to: David Rowley (#12)
Re: Sort functions with specialized comparators

Hi, why do you think that I rejected Andrey's offer? I included his patch
first in my own. Yes, patch 2-0006 is the only patch to which I have not
attached any statistics and it looks really dubious. But the rest seem
useful. Above, I attached a speed graph for one of the patches and tps(
pgbench)
What do you think is the format in which to make benchmarks for my patches?
Best regards, Stepan Neretin.

#14David Rowley
dgrowleyml@gmail.com
In reply to: Stepan Neretin (#13)
Re: Sort functions with specialized comparators

On Mon, 9 Sept 2024 at 01:00, Stepan Neretin <sndcppg@gmail.com> wrote:

Hi, why do you think that I rejected Andrey's offer? I included his patch first in my own. Yes, patch 2-0006 is the only patch to which I have not attached any statistics and it looks really dubious. But the rest seem useful. Above, I attached a speed graph for one of the patches and tps(pgbench)

The difference with your patches and Andrey's patch is that he
included a benchmark which is targeted to the code he changed and his
results show a speed-up.

What it appears that you've done is made an assortment of changes and
picked the least effort thing that tests performance of something. You
by chance saw a performance increase so assumed it was due to your
changes.

What do you think is the format in which to make benchmarks for my patches?

You'll need a benchmark that exercises the code you've changed to some
degree where it has a positive impact on performance. As far as I can
see, you've not done that yet.

Just to give you the benefit of the doubt, I applied all 10 v2 patches
and adjusted the call sites to add a NOTICE to include the size of the
array being sorted. Here is the result of running your benchmark:

$ pgbench -t1000 -d postgres
pgbench (18devel)
NOTICE: RelationGetIndexList 1
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 3
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 2
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 1
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 2
NOTICE: RelationGetStatExtList 0
starting vacuum...NOTICE: RelationGetIndexList 1
NOTICE: RelationGetIndexList 0
end.
NOTICE: RelationGetIndexList 1
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 1
NOTICE: RelationGetStatExtList 0
NOTICE: RelationGetIndexList 1
NOTICE: RelationGetStatExtList 0
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
number of failed transactions: 0 (0.000%)
latency average = 0.915 ms
initial connection time = 23.443 ms
tps = 1092.326732 (without initial connection time)

Note that -t1000 shows the same number of notices as -t1.

So, it seems everything you've changed that runs in your benchmark is
RelationGetIndexList() and RelationGetStatExtList(). In one of the
calls to RelationGetIndexList() we're sorting up to a maximum of 3
elements.

Just to be clear, I'm not stating that I think all of your changes are
useless. If you want these patches accepted, then you're going to need
to prove they're useful and you've not done that.

Also, unless Andrey is happy for you to tag onto the work he's doing,
I'd suggest another thread for that work. I don't think there's any
good reason for that work to delay Andrey's work.

David

#15Andrey Borodin
amborodin@acm.org
In reply to: David Rowley (#14)
Re: Sort functions with specialized comparators

On 9 Sep 2024, at 02:31, David Rowley <dgrowleyml@gmail.com> wrote:

Also, unless Andrey is happy for you to tag onto the work he's doing,
I'd suggest another thread for that work. I don't think there's any
good reason for that work to delay Andrey's work.

Stepan asked for mentoring project, so I handed him this patch set. We are working together, but the main goal is integrating Stepan into dev process. Well, the summer was really hot and we somehow were not advancing the project… So your thread bump is very timely!
Many thanks for your input about benchmarks! We will focus on measuring impact of changes. I totally share your concerns about optimization of sorts that are not used frequently.

Best regards, Andrey Borodin.

#16Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#1)
Re: Sort functions with specialized comparators

On 2 Dec 2024, at 08:39, John Naylor <johncnaylorls@gmail.com> wrote:

On Mon, Dec 2, 2024 at 1:12 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 25 Nov 2024, at 17:50, John Naylor <johncnaylorls@gmail.com> wrote:

I'd like to see the two sort specializations combined
into one, using a local comparator that knows when to reverse the
comparison result (hope that makes sense).

Sure, please find attached.
The prototype looks somewhat ugly (we pass bool* ascending instead of bool ascending) and it cost us ~2% of performance (on my MacBook Air M3).

I haven't tried to reproduce, but the comparison function has a
different style for DESC than the tuplesort comparators, and the style
here has worse code density, at least in isolation. You can see the
difference in the link below. I also found a way to make the cmp
reversal branch-free (last example). That may not survive once it's
inlined, of course, or could make things worse, but you can try these
if you like:

https://godbolt.org/z/nfPMT7Enr

On my machine this test
\timing on
SELECT (sort(arr))[1] FROM arrays_to_sort;\watch 0 c=5

produces
sort_int32_cmp Time: 543.690 ms
sort_int32_cmp_2 Time: 609.019 ms
sort_int32_cmp_4 Time: 612.219 ms

So, I'd stick with sort_int32_cmp. But, perhaps, on Intel we might have different results.

But is not more generic.

A lot of the churn is from v1, and made worse by v2, and that seems to
be from getting rid of the QSORT macro:

@@ -227,9 +228,10 @@ Datum
sort_asc(PG_FUNCTION_ARGS)
{
ArrayType *a = PG_GETARG_ARRAYTYPE_P_COPY(0);
+ bool ascending = true;

CHECKARRVALID(a);
- QSORT(a, 1);
+ sort_int32(ARRPTR(a), ARRNELEMS(a), &ascending);
PG_RETURN_POINTER(a);
}

The macro hides some details -- can we put "ascending" inside there?

Done.

Best regards, Andrey Borodin.

Attachments:

v3-0001-Use-specialized-sort-facilities.patchapplication/octet-stream; name=v3-0001-Use-specialized-sort-facilities.patch; x-unix-mode=0644Download+77-20
#17John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#16)
Re: Sort functions with specialized comparators

On Wed, Dec 4, 2024 at 2:47 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

sort_int32_cmp Time: 543.690 ms
sort_int32_cmp_2 Time: 609.019 ms
sort_int32_cmp_4 Time: 612.219 ms

So, I'd stick with sort_int32_cmp. But, perhaps, on Intel we might have different results.

I tried on an older Intel chip and got similar results, so we'll go
with your original comparator:

master: latency average = 1867.878 ms
cmp1: latency average = 1189.225 ms
cmp2: latency average = 1341.153 ms
cmp3: latency average = 1270.053 ms

I believe src/port/qsort.c was meant to be just for the standard sort
interface as found in a C library. We do have one globally visible
special sort here:
src/backend/utils/sort/qsort_interruptible.c
...so that directory seems a better fit. The declaration is in
src/include/port.h, though. Note: that one doesn't have a global
wrapper around a static function -- it's declared global since
ST_SCOPE is not defined.

And one more bikeshedding bit that might get noticed: tuplesorts
express their boolean as "reversed". We don't necessarily need to
follow that, but consistency is good for readability.

--
John Naylor
Amazon Web Services

#18Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#17)
Re: Sort functions with specialized comparators

On 5 Dec 2024, at 15:16, John Naylor <johncnaylorls@gmail.com> wrote:

I tried on an older Intel chip and got similar results, so we'll go
with your original comparator:

Ack.

I believe src/port/qsort.c was meant to be just for the standard sort
interface as found in a C library. We do have one globally visible
special sort here:
src/backend/utils/sort/qsort_interruptible.c
...so that directory seems a better fit.

OK. BTW do we need ST_CHECK_FOR_INTERRUPTS?

The declaration is in
src/include/port.h, though. Note: that one doesn't have a global
wrapper around a static function -- it's declared global since
ST_SCOPE is not defined.

Added static.

And one more bikeshedding bit that might get noticed: tuplesorts
express their boolean as "reversed". We don't necessarily need to
follow that, but consistency is good for readability.

I do not know if "reversed sorting order" is more idiomatic than "ascending sorting order". If you think it is - let's switch argument's name to "reversed".

Best regards, Andrey Borodin.

Attachments:

v4-0001-Use-specialized-sort-facilities.patchapplication/octet-stream; name=v4-0001-Use-specialized-sort-facilities.patch; x-unix-mode=0644Download+37-19
#19John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#18)
Re: Sort functions with specialized comparators

On Fri, Dec 6, 2024 at 1:32 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Dec 2024, at 15:16, John Naylor <johncnaylorls@gmail.com> wrote:

I believe src/port/qsort.c was meant to be just for the standard sort
interface as found in a C library. We do have one globally visible
special sort here:
src/backend/utils/sort/qsort_interruptible.c
...so that directory seems a better fit.

OK. BTW do we need ST_CHECK_FOR_INTERRUPTS?

That's a good thing to raise right now -- intarray currently doesn't
have one, and we haven't gotten complaints from people trying to sort
large arrays and cancel the query. This extension is not commonly
used, so that's not surprising. It could be that large arrays are even
less common, or no one bothered to report it. What's the largest size
that your customers use?

If we do need a check for interrupts, then this whole thing must
remain private to intarray. From reading e64cdab0030 , it's not safe
to interrupt in general.

And one more bikeshedding bit that might get noticed: tuplesorts
express their boolean as "reversed". We don't necessarily need to
follow that, but consistency is good for readability.

I do not know if "reversed sorting order" is more idiomatic than "ascending sorting order". If you think it is - let's switch argument's name to "reversed".

After sleeping on it, I actually think it's mildly ridiculous for this
module to force the comparator to know about the sort direction.
Tuplesorts must do that because each sort key could have a different
sort order. There is only one place in intarray that wants reversed
order -- maybe that place should reverse elements itself? It's fine to
keep thing as they are if the sort function stays private to intarray,
but this patch creates a global function, where the "ascending"
parameter is just noise. And if we don't have large int32 sorts
outside of intarray, then the path of least resistance may be to keep
it private.

I had a look at the files touched by this patch and noticed that there
is another sort used for making arrays unique. Were you going to look
at that as well? That reminded me of a patchset from Thomas Munro that
added bsearch and unique macros to the sort template -- see 0001-0003
in the link below. (That also includes a proposal to have a
specialization for uint32 -- I'm still not sure if that would have a
performance benefit for real workloads, but I believe the motivation
was mostly cosmetic):

/messages/by-id/CA+hUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg@mail.gmail.com

--
John Naylor
Amazon Web Services

#20Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#19)
Re: Sort functions with specialized comparators

On 6 Dec 2024, at 08:49, John Naylor <johncnaylorls@gmail.com> wrote:

/messages/by-id/CA+hUKGKztHEWm676csTFjYzortziWmOcf8HDss2Zr0muZ2xfEg@mail.gmail.com

Wow, what a thread!
"Simpsons Already Did It"

On Fri, Dec 6, 2024 at 1:32 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Dec 2024, at 15:16, John Naylor <johncnaylorls@gmail.com> wrote:

I believe src/port/qsort.c was meant to be just for the standard sort
interface as found in a C library. We do have one globally visible
special sort here:
src/backend/utils/sort/qsort_interruptible.c
...so that directory seems a better fit.

OK. BTW do we need ST_CHECK_FOR_INTERRUPTS?

That's a good thing to raise right now -- intarray currently doesn't
have one, and we haven't gotten complaints from people trying to sort
large arrays and cancel the query. This extension is not commonly
used, so that's not surprising. It could be that large arrays are even
less common, or no one bothered to report it. What's the largest size
that your customers use?

If we do need a check for interrupts, then this whole thing must
remain private to intarray. From reading e64cdab0030 , it's not safe
to interrupt in general.

I think commit message states that it's better to opt-in for interruptible sort. So I do not think making sort interruptible is a blocker for making global specialized sorting routines.

And one more bikeshedding bit that might get noticed: tuplesorts
express their boolean as "reversed". We don't necessarily need to
follow that, but consistency is good for readability.

I do not know if "reversed sorting order" is more idiomatic than "ascending sorting order". If you think it is - let's switch argument's name to "reversed".

After sleeping on it, I actually think it's mildly ridiculous for this
module to force the comparator to know about the sort direction.
Tuplesorts must do that because each sort key could have a different
sort order. There is only one place in intarray that wants reversed
order -- maybe that place should reverse elements itself? It's fine to
keep thing as they are if the sort function stays private to intarray,
but this patch creates a global function, where the "ascending"
parameter is just noise. And if we don't have large int32 sorts
outside of intarray, then the path of least resistance may be to keep
it private.

We could use global function for oid lists which may be arbitrary large.
But if you think that local intarray function is better - let's go that route.

I had a look at the files touched by this patch and noticed that there
is another sort used for making arrays unique. Were you going to look
at that as well?

Done.

Best regards, Andrey Borodin.

Attachments:

v5-0001-Use-specialized-sort-facilities.patchapplication/octet-stream; name=v5-0001-Use-specialized-sort-facilities.patch; x-unix-mode=0644Download+54-57
#21John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#20)
#22Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#21)
#23John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#22)
#24John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#22)
#25Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#24)
#26John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#25)
#27Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#26)
#28John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#27)
#29Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#28)
#30Nathan Bossart
nathandbossart@gmail.com
In reply to: John Naylor (#28)
#31John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#29)
#32John Naylor
john.naylor@enterprisedb.com
In reply to: Nathan Bossart (#30)
#33Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#31)
#34John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#33)
#35John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#34)
#36Andrey Borodin
amborodin@acm.org
In reply to: John Naylor (#35)
#37John Naylor
john.naylor@enterprisedb.com
In reply to: Andrey Borodin (#36)
#38John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#37)