Add RISC-V Zbb popcount optimization

Started by Greg Burd25 days ago15 messageshackers
Jump to latest
#1Greg Burd
greg@burd.me

Hello.

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

best.

-greg

Attachments:

v1-0001-Add-RISC-V-Zbb-popcount-optimization.patchtext/x-patch; name="=?UTF-8?Q?v1-0001-Add-RISC-V-Zbb-popcount-optimization.patch?="Download+91-3
#2Andres Freund
andres@anarazel.de
In reply to: Greg Burd (#1)
Re: Add RISC-V Zbb popcount optimization

Hi,

On 2026-03-21 12:54:10 -0400, Greg Burd wrote:

Attached is a small patch that enables hardware popcount on RISC-V when
available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

Maybe I'm missing something: How is the latter approach safe without a runtime
check? Just because it compiled on the build machine with -march=rv64gc_zbb
added doesn't mean it runs on either the build machine or any other machine?

If this worked, the compiler could just always specify -march=rv64gc_zbb, no?

Greetings,

Andres Freund

#3Greg Burd
greg@burd.me
In reply to: Andres Freund (#2)
Re: Add RISC-V Zbb popcount optimization

On Sat, Mar 21, 2026, at 2:36 PM, Andres Freund wrote:

Hi,

On 2026-03-21 12:54:10 -0400, Greg Burd wrote:

Attached is a small patch that enables hardware popcount on RISC-V when
available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

Maybe I'm missing something: How is the latter approach safe without a runtime
check? Just because it compiled on the build machine with -march=rv64gc_zbb
added doesn't mean it runs on either the build machine or any other machine?

If this worked, the compiler could just always specify -march=rv64gc_zbb, no?

Hey Andres, thanks for taking a look.

You are correct, mea culpa for not catching this before I sent it out. If the second test succeeds the patch will add `-march=rv64gc_zbb` to `CFLAGS` globally, which means without the runtime check the binary will crash with SIGILL on systems without Zbb.

I'll rework... :)

Greetings,

Andres Freund

best.

-greg

#4John Naylor
john.naylor@enterprisedb.com
In reply to: Greg Burd (#1)
Re: Add RISC-V Zbb popcount optimization

On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

I have to ask what the point is -- isn't that like putting a 4-inch
exhaust tip on a go-kart?

--
John Naylor
Amazon Web Services

#5Greg Burd
greg@burd.me
In reply to: John Naylor (#4)
Re: Add RISC-V Zbb popcount optimization

On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:

On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

I have to ask what the point is -- isn't that like putting a 4-inch
exhaust tip on a go-kart?

Hey John,

The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P

gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
sw popcount: 0.196 sec ( 510.08 MB/s)
hw popcount: 0.293 sec ( 341.48 MB/s)

diff: 0.67x
match: 406261900 bits counted
sw popcount: 0.182 sec ( 548.86 MB/s)
hw popcount: 0.044 sec ( 2279.89 MB/s)

diff: 4.15x
match: 406261900 bits counted

But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific logic. v2 attached along with a test program.

--
John Naylor
Amazon Web Services

best.

-greg

Attachments:

riscv-popcnt.ctext/x-csrc; name=riscv-popcnt.cDownload
v2-0001-Add-RISC-V-Zbb-popcount-optimization.patchtext/x-patch; name="=?UTF-8?Q?v2-0001-Add-RISC-V-Zbb-popcount-optimization.patch?="Download+229-9
#6Andres Freund
andres@anarazel.de
In reply to: Greg Burd (#5)
Re: Add RISC-V Zbb popcount optimization

Hi,

On 2026-03-22 13:43:43 -0400, Greg Burd wrote:

On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:

On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

I have to ask what the point is -- isn't that like putting a 4-inch
exhaust tip on a go-kart?

The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P

gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
sw popcount: 0.196 sec ( 510.08 MB/s)
hw popcount: 0.293 sec ( 341.48 MB/s)

diff: 0.67x
match: 406261900 bits counted
sw popcount: 0.182 sec ( 548.86 MB/s)
hw popcount: 0.044 sec ( 2279.89 MB/s)

diff: 4.15x
match: 406261900 bits counted

But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific logic. v2 attached along with a test program.

Sure, but what PG workloads are actually affected to a meaningful degree by
this? And are those, on riscv, actually most bottlenecked by popcount
performance?

I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
all that effectively - hard to believe there's any real world workloads where
that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
world use of those platforms, making niche-y perf improvements somewhat
worthwhile. Whereas there's afaict not yet a whole lot of riscv production
adoption.

Once you add CPU dispatch to the cost it gets a heck of a lot less clearly
worthwhile. You need heuristics to decide when the dispatch cost is worth it
and even then it's going to slow down your non-worthwhile case somewhat.

That's one of the things that make's riscv's decision to put so many crucial
features into optional extensions so annoying for people that write
non-embedded software.

- Andres

#7Greg Burd
greg@burd.me
In reply to: Andres Freund (#6)
Re: Add RISC-V Zbb popcount optimization

On Sun, Mar 22, 2026, at 2:01 PM, Andres Freund wrote:

Hi,

On 2026-03-22 13:43:43 -0400, Greg Burd wrote:

On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:

On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

I have to ask what the point is -- isn't that like putting a 4-inch
exhaust tip on a go-kart?

The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P

gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
sw popcount: 0.196 sec ( 510.08 MB/s)
hw popcount: 0.293 sec ( 341.48 MB/s)

diff: 0.67x
match: 406261900 bits counted
sw popcount: 0.182 sec ( 548.86 MB/s)
hw popcount: 0.044 sec ( 2279.89 MB/s)

diff: 4.15x
match: 406261900 bits counted

But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific logic. v2 attached along with a test program.

Sure, but what PG workloads are actually affected to a meaningful degree by
this? And are those, on riscv, actually most bottlenecked by popcount
performance?

I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
all that effectively - hard to believe there's any real world workloads where
that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
world use of those platforms, making niche-y perf improvements somewhat
worthwhile. Whereas there's afaict not yet a whole lot of riscv production
adoption.

Once you add CPU dispatch to the cost it gets a heck of a lot less clearly
worthwhile. You need heuristics to decide when the dispatch cost is worth it
and even then it's going to slow down your non-worthwhile case somewhat.

That's one of the things that make's riscv's decision to put so many crucial
features into optional extensions so annoying for people that write
non-embedded software.

Hey Andres,

All fair points. RISC-V is annoying, the idea of CPU extensions is just one reason. To be honest, I'm not sure it is worth it either! That said, this patch isn't a huge "squeeze" (or unprecedented) and it does provide some "juice" (4x faster). It has the shape of the ARM equivalent, so to me it fell into that category of things we'd commit.

But I get it, as I said to start - all fair points.

- Andres

best.

-greg

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#6)
Re: Add RISC-V Zbb popcount optimization

On Sun, Mar 22, 2026 at 02:01:50PM -0400, Andres Freund wrote:

I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
all that effectively - hard to believe there's any real world workloads where
that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
world use of those platforms, making niche-y perf improvements somewhat
worthwhile. Whereas there's afaict not yet a whole lot of riscv production
adoption.

That work was partially motivated by vector stuff that used popcount
functions pretty heavily, but yeah, the complexity compared to the gains is
the main reason I've been pushing to just use simd.h elsewhere (i.e., SSE2
and Neon). I'd still consider using AVX-512, etc. for things if the impact
on real-world workloads was huge, though.

--
nathan

#9Greg Burd
greg@burd.me
In reply to: Nathan Bossart (#8)
Re: Add RISC-V Zbb popcount optimization

On Mon, Mar 23, 2026, at 11:09 AM, Nathan Bossart wrote:

On Sun, Mar 22, 2026 at 02:01:50PM -0400, Andres Freund wrote:

I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
all that effectively - hard to believe there's any real world workloads where
that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
world use of those platforms, making niche-y perf improvements somewhat
worthwhile. Whereas there's afaict not yet a whole lot of riscv production
adoption.

Hey Nathan,

That work was partially motivated by vector stuff that used popcount
functions pretty heavily, but yeah, the complexity compared to the gains is
the main reason I've been pushing to just use simd.h elsewhere (i.e., SSE2
and Neon). I'd still consider using AVX-512, etc. for things if the impact
on real-world workloads was huge, though.

Yes, that and by research done while trying to understand why my RISC-V build farm animal "greenfly" (OrangePi RV2 with a VisionFive 2 CPU: RISC-V RV64GC + Zba/Zbb/Zbc/Zbs) is failing consistently.

--
nathan

Forgive me, while $subject only mentions popcount I couldn't help myself so I added a few more RISC-V patches including a bug fix that I hope makes greenfly happy again.

0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.

------> Join me in "the rabbit hole" on this issue if you care to...

The existing software DES (as shown by the build-farm animal "greenfly" [1]https://github.com/abseil/abseil-cpp/pull/1986 absl/crc/internal/crc_riscv.cc) fails because Clang 20 has an auto-vectorization bug that we trigger in the DES initialization code (des_init() function), not the DES encryption algorithm itself.

I searched the LLVM issue tracker, here are the issues that caught my eye:
1. Issue #176001 - "RISC-V Wrong code at -O1"
- Vector peephole optimization with vmerge folding
- Fixed by PR #176077 (merged Jan 2024)
- Link: https://github.com/llvm/llvm-project/issues/176001
2. Issue #187458 - "Wrong code for vector.extract.last.active"
- Large index issues with zvl1024b
- Partially fixed, still work ongoing
- Link: https://github.com/llvm/llvm-project/issues/187458
3. Issue #171978 - "RISC-V Wrong code at -O2/O3"
- Illegal instruction from mismatched EEW
- Under investigation
- Link: https://github.com/llvm/llvm-project/issues/171978
4. PR #176105 - "Fix i64 gather/scatter cost on rv32"
- Cost model fixes for scatter/gather (merged Jan 2026)
- Link: https://github.com/llvm/llvm-project/pull/176105

My fix in 0001 is simply adding this in a few places in crypt-des.c:

#if defined(__riscv) && defined(__clang__)
pg_memory_barrier();
#endif

While searching I ran across a different solution, adding `-mllvm -riscv-v-vector-bits-min=0` sets the minimum vector bit width for RISC-V vector extension in LLVM to 0 disabling all vectorization forcing scalar code generation, no RVV instructions are emitted. This would prevent the DES bug at the cost of any vectorization anywhere in the binary.

While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV optimizations seems to heavy handed to me.

------> Moving on.

0002 - (was "0001" in v2) this is unchanged, it implements popcount using Zbb extension on RISC-V

0003 - is a small patch that adapted from the Google Abseil project's RISC-V CRC32C implementation [1]https://github.com/abseil/abseil-cpp/pull/1986 absl/crc/internal/crc_riscv.cc. It is *a lot faster* than the software crc32c we fall back to now (see: riscv-crc32c.c). This algorithm requires the Zbc (or Zbkc) extension (for clmul) so the patch tests for that at build and adds the '-march' flag when it is. However, as is the case for Zbb and popcnt in, the presence of Zbc (or Zbkc) must be detected at runtime. That's done following the pre-existing pattern used for ARM features. This does introduce some runtime overhead and complexity, not more than required I hope.

I attached test code, and results at the end of this email:
* riscv-popcnt.c - unchanged
* riscv-crc32c.c - new, based on work in the Google Abseil project
* riscv-des.c - highlights the fix for DES using Clang on RISC-V

I guess the question for 002 and/or 003 is if the "juice" is worth the "squeeze" or not. There is a lot of performance juice to be had IMO. But some might argue that RISC-V isn't widely adopted yet, and they'd be right. Others might point out that RISC-V is currently showing up in embedded systems more than server/desktop/laptop/cloud, also true. However, there is some evidence that is changing as there are RISC-V in servers [2]https://www.firefly.store/products/rs-sra120-risc-v-server-2u-computing-server-cloud-storage-large-model-sg2042[3]https://edgeaicomputer.com/our-products/servers/risc-v-compute-server-sra1-20/, and there is a hosted (cloud) solution from Scaleway [4]https://www.scaleway.com/en/news/scaleway-launches-its-risc-v-servers-in-the-cloud-a-world-first-and-a-firm-commitment-to-technological-independence/. There exists a 64 core RISC-V desktop [6]https://deepcomputing.io/product/dc-roma-risc-v-mainboard/ and a Framework laptop mainboard [7]http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html sporting a RISC-V CPUs. And there is the OrangePi RV2 [7]http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html I have that is "greenfly".

Is it early days? Certainly! But too early? That's up for debate. :)

If nothing else, these patches can be a durable record and used later when RISC-V is a critical platform for Postgres or informational to other projects.

best.

-greg

[1]: https://github.com/abseil/abseil-cpp/pull/1986 absl/crc/internal/crc_riscv.cc
[2]: https://www.firefly.store/products/rs-sra120-risc-v-server-2u-computing-server-cloud-storage-large-model-sg2042
[3]: https://edgeaicomputer.com/our-products/servers/risc-v-compute-server-sra1-20/
[4]: https://www.scaleway.com/en/news/scaleway-launches-its-risc-v-servers-in-the-cloud-a-world-first-and-a-firm-commitment-to-technological-independence/
[5]: https://milkv.io/pioneer and https://www.crowdsupply.com/milk-v/milk-v-pioneer/updates/current-status-of-production
[6]: https://deepcomputing.io/product/dc-roma-risc-v-mainboard/
[7]: http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html

---- TEST PROGRAM OUTPUT:

gburd@rv:~/ws/postgres$ make -f Makefile.RISCV
gcc -O2 riscv-des.c -o des-gcc-sw
gcc -O2 riscv-des.c -march=rv64gcv -o des-gcc-hw
clang-20 -O1 riscv-des.c -o des-clang-o1-sw
clang-20 -O1 -march=rv64gcv riscv-des.c -o des-clang-o1-hw
clang-20 -O2 riscv-des.c -o des-clang-o2-sw
clang-20 -O2 -march=rv64gcv riscv-des.c -o des-clang-o2-hw
gcc -O2 -o popcnt-gcc-o2-sw riscv-popcnt.c
gcc -O2 -march=rv64gc_zbb -o popcnt-gcc-o2-hw riscv-popcnt.c
clang-20 -O2 -o popcnt-clang-o2-sw riscv-popcnt.c
clang-20 -O2 -march=rv64gc_zbb -o popcnt-clang-o2-hw riscv-popcnt.c
gcc -O2 -o crc32c-gcc-o2-sw riscv-crc32c.c
gcc -O2 -march=rv64gc_zbc -o crc32c-gcc-o2-hw riscv-crc32c.c
clang-20 -O2 -o crc32c-clang-o2-sw riscv-crc32c.c
clang-20 -O2 -march=rv64gc_zbc -o crc32c-clang-o2-hw riscv-crc32c.c
gburd@rv:~/ws/postgres$ make -f Makefile.RISCV test
./des-gcc-sw
Compiler: GCC 13.3.0
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.409 seconds (409 ns/iter)
With barriers: 0.416 seconds (416 ns/iter)
Overhead: 1.6%
./des-gcc-hw
Compiler: GCC 13.3.0
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.410 seconds (410 ns/iter)
With barriers: 0.410 seconds (410 ns/iter)
Overhead: Negligible
./des-clang-o1-sw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.517 seconds (517 ns/iter)
With barriers: 0.516 seconds (516 ns/iter)
Overhead: Negligible
./des-clang-o1-hw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.405 seconds (405 ns/iter)
With barriers: 0.405 seconds (405 ns/iter)
Overhead: Negligible
./des-clang-o2-sw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.517 seconds (517 ns/iter)
With barriers: 0.518 seconds (518 ns/iter)
Overhead: Negligible
./des-clang-o2-hw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
ERROR: un_pbox mismatch:
un_pbox[0] = 15, expected 8
un_pbox[1]https://github.com/abseil/abseil-cpp/pull/1986 absl/crc/internal/crc_riscv.cc = 6, expected 16
un_pbox[2]https://www.firefly.store/products/rs-sra120-risc-v-server-2u-computing-server-cloud-storage-large-model-sg2042 = 19, expected 22
un_pbox[3]https://edgeaicomputer.com/our-products/servers/risc-v-compute-server-sra1-20/ = 20, expected 30
un_pbox[4]https://www.scaleway.com/en/news/scaleway-launches-its-risc-v-servers-in-the-cloud-a-world-first-and-a-firm-commitment-to-technological-independence/ = 28, expected 12
... and 27 more errors
FAIL: Permutation tables are incorrect

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.093 seconds (93 ns/iter)
With barriers: 0.407 seconds (407 ns/iter)
Overhead: 335.5%
./popcnt-gcc-o2-sw
sw popcount: 0.183 sec ( 547.89 MB/s)
hw popcount: 0.274 sec ( 365.40 MB/s)

diff: 0.67x
match: 406261900 bits counted
./popcnt-gcc-o2-hw
sw popcount: 0.182 sec ( 548.17 MB/s)
hw popcount: 0.044 sec ( 2287.82 MB/s)

diff: 4.17x
match: 406261900 bits counted
./popcnt-clang-o2-sw
sw popcount: 0.188 sec ( 531.96 MB/s)
hw popcount: 0.207 sec ( 482.84 MB/s)

diff: 0.91x
match: 406261900 bits counted
./popcnt-clang-o2-hw
sw popcount: 0.224 sec ( 446.46 MB/s)
hw popcount: 0.056 sec ( 1794.83 MB/s)

diff: 4.02x
match: 406261900 bits counted
./crc32c-gcc-o2-sw
sw crc32c: 0.651 sec ( 153.68 MB/s)
hw crc32c: 0.651 sec ( 153.72 MB/s)

diff: 1.00x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-gcc-o2-hw
sw crc32c: 0.651 sec ( 153.70 MB/s)
hw crc32c: 0.000 sec ( 308052.33 MB/s)

diff: 2004.21x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-clang-o2-sw
sw crc32c: 0.584 sec ( 171.10 MB/s)
hw crc32c: 0.584 sec ( 171.17 MB/s)

diff: 1.00x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-clang-o2-hw
sw crc32c: 0.584 sec ( 171.15 MB/s)
hw crc32c: 0.000 sec ( 309282.38 MB/s)

diff: 1807.08x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)

Attachments:

Makefile.RISCVapplication/octet-stream; name=Makefile.RISCVDownload
riscv-crc32c.ctext/x-csrc; name=riscv-crc32c.cDownload
riscv-des.ctext/x-csrc; name=riscv-des.cDownload
riscv-popcnt.ctext/x-csrc; name=riscv-popcnt.cDownload
v3-0001-Avoid-Clang-RISC-V-auto-vectorization-bug-in-DES.patchtext/x-patch; name="=?UTF-8?Q?v3-0001-Avoid-Clang-RISC-V-auto-vectorization-bug-in-DES.patch?="Download+22-2
v3-0002-Add-RISC-V-popcount-using-Zbb-extension.patchtext/x-patch; name="=?UTF-8?Q?v3-0002-Add-RISC-V-popcount-using-Zbb-extension.patch?="Download+226-9
v3-0003-Add-RISC-V-CRC32C-using-the-Zbc-extension.patchtext/x-patch; name="=?UTF-8?Q?v3-0003-Add-RISC-V-CRC32C-using-the-Zbc-extension.patch?="Download+482-8
#10John Naylor
john.naylor@enterprisedb.com
In reply to: Greg Burd (#9)
clang bug affecting greenfly

[new subject]

On Sat, Mar 28, 2026 at 3:22 AM Greg Burd <greg@burd.me> wrote:

0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.

------> Join me in "the rabbit hole" on this issue if you care to...

The existing software DES (as shown by the build-farm animal "greenfly" [1]) fails because Clang 20 has an auto-vectorization bug that we trigger in the DES initialization code (des_init() function), not the DES encryption algorithm itself.

[disable vectorization entirely]
While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV optimizations seems to heavy handed to me.

The first thing I notice is that not very long ago the buildfarm had 3
gcc RISC-V members, but not anymore. If you care about having coverage
for this hardware, I'd suggest picking up gcc again if that's still
working, and wait and see about clang. Clang has shipped broken code
generation for obscure platforms in the past, and it seems here we're
not even sure of the extent of the breakage.

--
John Naylor
Amazon Web Services

#11Greg Burd
greg@burd.me
In reply to: John Naylor (#10)
Re: clang bug affecting greenfly

On Mon, Mar 30, 2026, at 2:39 AM, John Naylor wrote:

[new subject]

On Sat, Mar 28, 2026 at 3:22 AM Greg Burd <greg@burd.me> wrote:

0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.

------> Join me in "the rabbit hole" on this issue if you care to...

The existing software DES (as shown by the build-farm animal "greenfly" [1]) fails because Clang 20 has an auto-vectorization bug that we trigger in the DES initialization code (des_init() function), not the DES encryption algorithm itself.

[disable vectorization entirely]
While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV optimizations seems to heavy handed to me.

The first thing I notice is that not very long ago the buildfarm had 3
gcc RISC-V members, but not anymore. If you care about having coverage
for this hardware, I'd suggest picking up gcc again if that's still
working, and wait and see about clang. Clang has shipped broken code
generation for obscure platforms in the past, and it seems here we're
not even sure of the extent of the breakage.

Hey John,

All fair points. I've changed greenfly to use GCC 13.3.0, thanks for the suggestion.

--
John Naylor
Amazon Web Services

best.

-greg

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Greg Burd (#11)
Re: clang bug affecting greenfly

Hi Greg,

On 2026-Mar-30, Greg Burd wrote:

All fair points. I've changed greenfly to use GCC 13.3.0, thanks for
the suggestion.

Hmm, the 'update_personality.pl' script suplied with the buildfarm
client script allows you to change the compiler version, but not the
compiler itself -- the rationale being that a machine with a different
compiler should be a different animal. So I suggest to put greenfly to
rest until the clang situation is resolved (at which time you're welcome
to turn it back on), and request a new animal to use in the same machine
running gcc. Right now, greenfly is reporting that it is running clang
13.3.0, which AFAIK makes is inconsistent.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

#13Greg Burd
greg@burd.me
In reply to: Alvaro Herrera (#12)
Re: clang bug affecting greenfly

On Mon, Mar 30, 2026, at 12:14 PM, Álvaro Herrera wrote:

Hi Greg,

On 2026-Mar-30, Greg Burd wrote:

All fair points. I've changed greenfly to use GCC 13.3.0, thanks for
the suggestion.

Hmm, the 'update_personality.pl' script suplied with the buildfarm
client script allows you to change the compiler version, but not the
compiler itself -- the rationale being that a machine with a different
compiler should be a different animal. So I suggest to put greenfly to
rest until the clang situation is resolved (at which time you're welcome
to turn it back on), and request a new animal to use in the same machine
running gcc. Right now, greenfly is reporting that it is running clang
13.3.0, which AFAIK makes is inconsistent.

Interesting, I was just looking for a why to change that after discovering the update_personality.pl limitation.

Sure, I'll apply for a new animal and change greenfly back.

-greg

Show quoted text

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

#14John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#10)
Re: clang bug affecting greenfly

On Mon, Mar 30, 2026 at 1:39 PM John Naylor <johncnaylorls@gmail.com> wrote:

The first thing I notice is that not very long ago the buildfarm had 3
gcc RISC-V members, but not anymore.

Of course, as soon as I said that, two of them reappeared fairly
quickly after a month's absence...

--
John Naylor
Amazon Web Services

#15Greg Burd
greg@burd.me
In reply to: John Naylor (#14)
Re: clang bug affecting greenfly

On Tue, Mar 31, 2026, at 3:32 AM, John Naylor wrote:

On Mon, Mar 30, 2026 at 1:39 PM John Naylor <johncnaylorls@gmail.com> wrote:

The first thing I notice is that not very long ago the buildfarm had 3
gcc RISC-V members, but not anymore.

Of course, as soon as I said that, two of them reappeared fairly
quickly after a month's absence...

Well, add one more to that list. On the same box as "greenfly (clang)" is "mollusk (gcc)". Their configurations only differ in compiler. Right now greenfly is not reporting results (I pass the "--test" flag) given the known issues with clang. I'm happy to re-enable it if that's worth while.

best.

-greg

Show quoted text

--
John Naylor
Amazon Web Services