Proposal for enabling auto-vectorization for checksum calculations

Started by Matthew Sterrett11 months ago42 messages

matthewsterrett2@gmail.com

11 months ago

Hello,
This patch enables more compiler autovectorization for the checksum
calculations.
This code is particularly well suited for autovectorization, so just
adding pg_attribute_target and some simple dynamic dispatch logic we
can get improved vectorization.
This gives about a 2x speedup in a synthetic benchmark for
pg_checksum, which is also included as a seperate patch file.

Additionally, another 2x performance increase in the synthetic
benchmark with AVX2 can be obtained if N_SUMS was changed to 64.
However, this would change the results of the checksum. This isn't
included in this patch, but I think it is worth considering for the
future

One additional factor, without explicitly passing some optimization
flag like -O2 the makefile build won't autovectorize any of the code.
However, the meson based build does this automatically.

Matthew Sterrett

matthewsterrett2@gmail.com

10 months ago

In reply to: Matthew Sterrett (#1)

Re: Proposal for enabling auto-vectorization for checksum calculations

Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?

Stepan Neretin

slpmcf@gmail.com

10 months ago

In reply to: Matthew Sterrett (#2)

Re: Proposal for enabling auto-vectorization for checksum calculations

On Thu, May 8, 2025 at 6:57 AM Matthew Sterrett <matthewsterrett2@gmail.com>
wrote:

Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?

Hi Matthew,

Thanks for the patch!

I ran some timing tests:

(without avx2)

Time: 4034.351 ms
SELECT drive_pg_checksum(512);

(with avx2)

Time: 3559.076 ms
SELECT drive_pg_checksum(512);

Also attached two patches that should fix the CI issues.

Best,

Stepan Neretin

slpmcf@gmail.com

10 months ago

In reply to: Stepan Neretin (#3)

Re: Proposal for enabling auto-vectorization for checksum calculations

On Sat, May 10, 2025 at 6:01 PM Stepan Neretin <slpmcf@gmail.com> wrote:

On Thu, May 8, 2025 at 6:57 AM Matthew Sterrett <
matthewsterrett2@gmail.com> wrote:

Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?

Hi Matthew,

Thanks for the patch!

I ran some timing tests:

(without avx2)

Time: 4034.351 ms
SELECT drive_pg_checksum(512);

(with avx2)

Time: 3559.076 ms
SELECT drive_pg_checksum(512);

Also attached two patches that should fix the CI issues.

Best,

Stepan Neretin

Oops, forgot to attach patches :)

Best,

Stepan Neretin

Matthew Sterrett

matthewsterrett2@gmail.com

10 months ago

In reply to: Stepan Neretin (#4)

Re: Proposal for enabling auto-vectorization for checksum calculations

Hello! Thanks for helping me with this.
I'm still trying to figure out what is going on with the Bookworm test
failures. I'm pretty sure this patchset should resolve all the issues
with the macOS build, but I don't think it will help the linux
failures unfortunately.

Show quoted text

On Sat, May 10, 2025 at 4:02 AM Stepan Neretin <slpmcf@gmail.com> wrote:

On Sat, May 10, 2025 at 6:01 PM Stepan Neretin <slpmcf@gmail.com> wrote:

On Thu, May 8, 2025 at 6:57 AM Matthew Sterrett <matthewsterrett2@gmail.com> wrote:

Hello! I'm still trying to figure out those CI failures, I just wanted
to update things.

From my testing, with this patch repeatedly disabling/enabling
checksums is about 12.4% on an approximately 15 GB database.

By the way, I'd love it if anyone could help me figure out how to
replicate a CI failure in the Cirrus CI.
I haven't been able to figure out how to test CI runs locally, does
anyone know a good method to do that?

Hi Matthew,

Thanks for the patch!

I ran some timing tests:

(without avx2)

Time: 4034.351 ms
SELECT drive_pg_checksum(512);

(with avx2)

Time: 3559.076 ms
SELECT drive_pg_checksum(512);

Also attached two patches that should fix the CI issues.

Best,

Stepan Neretin

Oops, forgot to attach patches :)

Best,

Stepan Neretin

Nazir Bilal Yavuz

byavuz81@gmail.com

10 months ago

In reply to: Matthew Sterrett (#5)

Re: Proposal for enabling auto-vectorization for checksum calculations

Hi,

On Tue, 20 May 2025 at 02:54, Matthew Sterrett
<matthewsterrett2@gmail.com> wrote:

Hello! Thanks for helping me with this.
I'm still trying to figure out what is going on with the Bookworm test
failures. I'm pretty sure this patchset should resolve all the issues
with the macOS build, but I don't think it will help the linux
failures unfortunately.

You can see the failure at the artifacts ->
'log/tmp_install/log/install.log' file on the CI web page [1]https://cirrus-ci.com/task/4834162550505472.

If you want to replicate that on your local:

$ ./configure --with-llvm CLANG="ccache clang-16"
$ make -s -j8 world-bin
$ make -j8 check-world

should be enough. I was able to replicate it with these commands. I
hope these help.

[1]: https://cirrus-ci.com/task/4834162550505472

--
Regards,
Nazir Bilal Yavuz
Microsoft

Matthew Sterrett

matthewsterrett2@gmail.com

10 months ago

In reply to: Nazir Bilal Yavuz (#6)

Re: Proposal for enabling auto-vectorization for checksum calculations

You can see the failure at the artifacts ->
'log/tmp_install/log/install.log' file on the CI web page [1].

If you want to replicate that on your local:

$ ./configure --with-llvm CLANG="ccache clang-16"
$ make -s -j8 world-bin
$ make -j8 check-world

should be enough. I was able to replicate it with these commands. I
hope these help.

Thanks so much for helping me figure this out!

Okay, I've determined that versions of LLVM/Clang before 19 crash when
compiling this patch for some reason; it seems that both make
check-world and make install will crash with the affected LLVM
versions.
Unfortunately, what matters seems to be the version of the linker/LTO
optimizer, which I don't think we can check at compile time.
I added a check for Clang>=19 which works at preventing the crash on my system.
I think it's possible some unusual combination of clang/LLVM might
still crash during the build, but I think this is a reasonable
solution

John Naylor

john.naylor@enterprisedb.com

9 months ago

In reply to: Matthew Sterrett (#7)

Re: Proposal for enabling auto-vectorization for checksum calculations

On Fri, May 23, 2025 at 4:54 AM Matthew Sterrett
<matthewsterrett2@gmail.com> wrote:

Okay, I've determined that versions of LLVM/Clang before 19 crash when
compiling this patch for some reason; it seems that both make
check-world and make install will crash with the affected LLVM
versions.
Unfortunately, what matters seems to be the version of the linker/LTO
optimizer, which I don't think we can check at compile time.
I added a check for Clang>=19 which works at preventing the crash on my system.
I think it's possible some unusual combination of clang/LLVM might
still crash during the build, but I think this is a reasonable
solution

I don't know if this is related to the crashes, but it doesn't seem
like a good idea to #include the function pointer stuff everywhere,
that should probably go into src/port like the others.

--
John Naylor
Amazon Web Services

Proposal for enabling auto-vectorization for checksum calculations

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: