centralize CPU feature detection

Started by John Naylor20 days ago13 messages
Jump to latest
#1John Naylor
john.naylor@enterprisedb.com

We have accrued duplicate bits of hardware detection logic in
different places, and the the AVX2 page checksum patch is about to add
more. It seems like a good time to try again to centralize things,
before that happens. The attached only touches x86, but that's enough
to demonstrate, and there's no point in trying to do everything at
once. Arm should get the same treatment at some point.

0001 starts by renaming pg_crc32c_sse42_choose.c to something more
general and does just enough to fix the build. Without a separate
rename step, there's too much change for git to call it a rename.
Humans can still see some carryover, so it seems right to keep git
history continuous.

0002 adds an array of bool indexed by an enum of feature names, and
adjusts the CRC and popcount code to use it.

0003 refactors detection of support for ZMM registers in preparation
of doing same for YMM (needed by AVX2).

0004 rebases the latest page checksum patch on top of the above for
demonstration (review of that is happening in its own thread [1]/messages/by-id/CA+vA85_5GTu+HHniSbvvP+8k3=xZO=WE84NPwiKyxztqvpfZ3Q@mail.gmail.com). Not
counting autoconf/meson and the pointer juggling, the additional
feature detection is now only 2 lines of code, which is nice.

For PG20, we can build on this to simplify the rat's nest of #ifdefs
that a couple of src/include/port headers have. We should also be able
to arrange so that packagers that pass relevant flags to common
compilers will automatically get some branches/indirection eliminated
via the compiler's standard dead code elimination, in a simple way,
rather than our having to kluge it together in multiple places. That
will make irrelevant the question that occasionally come up about
moving hardware requirements. Anyway, 0001-3 is doable for PG19.

[1]: /messages/by-id/CA+vA85_5GTu+HHniSbvvP+8k3=xZO=WE84NPwiKyxztqvpfZ3Q@mail.gmail.com

--
John Naylor
Amazon Web Services

Attachments:

v1-0002-Centralize-detection-of-CPU-features.patchapplication/x-patch; name=v1-0002-Centralize-detection-of-CPU-features.patchDownload+119-133
v1-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchapplication/x-patch; name=v1-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchDownload+181-36
v1-0001-Rename-CRC-choose-files-for-future-general-purpos.patchapplication/x-patch; name=v1-0001-Rename-CRC-choose-files-for-future-general-purpos.patchDownload+6-7
v1-0003-Refactor-the-detection-of-ZMM-registers.patchapplication/x-patch; name=v1-0003-Refactor-the-detection-of-ZMM-registers.patchDownload+23-21
#2Zsolt Parragi
zsolt.parragi@percona.com
In reply to: John Naylor (#1)
Re: centralize CPU feature detection

Hello!

Seems like a file (checksum_block_internal.h) is missing from the patch?

For the entire src/include/port/pg_x86_feature.h:

Shouldn't it have an

+#if defined(USE_SSE2) || defined(__i386__)
...
#endif

block around the file, to skip everything on other platforms?

In src/include/port/pg_x86_feature.h:33

+
+extern PGDLLEXPORT bool X86Feature[];
+

Shouldn't that be PGDLLIMPORT?

+typedef enum X86FeatureId
+{
+ init,
+
+ PG_SSE4_2,
+ PG_POPCNT,

Shouldn't that be INIT?

#3John Naylor
john.naylor@enterprisedb.com
In reply to: Zsolt Parragi (#2)
Re: centralize CPU feature detection

On Tue, Feb 17, 2026 at 3:15 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

Hello!

Seems like a file (checksum_block_internal.h) is missing from the patch?

Should work now, and thanks for looking.

For the entire src/include/port/pg_x86_feature.h:

Shouldn't it have an

+#if defined(USE_SSE2) || defined(__i386__)
...
#endif

block around the file, to skip everything on other platforms?

Done. I haven't tried Arm support yet, but now I realize the header
should be named generically, so it's now "pg_cpu.h". Then it can be
included everywhere.

I've also gone with "pg_cpu_x86.c" for better consistency within this
directory, and used the plural for the array name.

In src/include/port/pg_x86_feature.h:33

+
+extern PGDLLEXPORT bool X86Feature[];
+

Shouldn't that be PGDLLIMPORT?

Fixed.

+typedef enum X86FeatureId
+{
+ init,
+
+ PG_SSE4_2,
+ PG_POPCNT,

Shouldn't that be INIT?

I don't know. The instruction family names are conventionally all in
caps, but this is just our signal that we've populated the array. That
said, a less generic name would better for grep-ability.

I added some quick comments here where the instruction families are
split apart. I'm not sure what info is relevent, but it seemed good to
separate them.

--
John Naylor
Amazon Web Services

Attachments:

v2-0001-Rename-CRC-choose-files-for-future-general-purpos.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Rename-CRC-choose-files-for-future-general-purpos.patchDownload+12-9
v2-0002-Centralize-detection-of-CPU-features.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Centralize-detection-of-CPU-features.patchDownload+112-125
v2-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchDownload+224-36
v2-0003-Refactor-the-detection-of-ZMM-registers.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Refactor-the-detection-of-ZMM-registers.patchDownload+24-23
#4John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#3)
Re: centralize CPU feature detection

v3 removes some debug code that was causing CI to fail.

--
John Naylor
Amazon Web Services

Attachments:

v3-0002-Centralize-detection-of-CPU-features.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Centralize-detection-of-CPU-features.patchDownload+107-125
v3-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchDownload+224-36
v3-0003-Refactor-the-detection-of-ZMM-registers.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Refactor-the-detection-of-ZMM-registers.patchDownload+24-23
v3-0001-Rename-CRC-choose-files-for-future-general-purpos.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Rename-CRC-choose-files-for-future-general-purpos.patchDownload+12-9
#5Zsolt Parragi
zsolt.parragi@percona.com
In reply to: John Naylor (#4)
Re: centralize CPU feature detection

Done. I haven't tried Arm support yet, but now I realize the header
should be named generically, so it's now "pg_cpu.h". Then it can be
included everywhere.

That makes sense, and simplifies the usage of the header. (However,
the include guard still refers to the old name)

I don't know. The instruction family names are conventionally all in
caps, but this is just our signal that we've populated the array. That
said, a less generic name would better for grep-ability.

Yes, that could work too. But reserving the lowercase "init" symbol in
a very generic header seems like a bad idea (especially for a use case
that isn't used globally), even if Postgres itself doesn't use the
symbol for anything else. "INIT" at least would be unlikely to
conflict with something else.

#6John Naylor
john.naylor@enterprisedb.com
In reply to: Zsolt Parragi (#5)
Re: centralize CPU feature detection

On Thu, Feb 19, 2026 at 1:47 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

Done. I haven't tried Arm support yet, but now I realize the header
should be named generically, so it's now "pg_cpu.h". Then it can be
included everywhere.

That makes sense, and simplifies the usage of the header. (However,
the include guard still refers to the old name)

Oops, fixed.

I don't know. The instruction family names are conventionally all in
caps, but this is just our signal that we've populated the array. That
said, a less generic name would better for grep-ability.

Yes, that could work too. But reserving the lowercase "init" symbol in
a very generic header seems like a bad idea (especially for a use case
that isn't used globally), even if Postgres itself doesn't use the
symbol for anything else. "INIT" at least would be unlikely to
conflict with something else.

Still seems pretty generic, so I went with INIT_PG_X86.

I've also made a quick attempt at Arm support just to make sure I
didn't paint myself into a corner (v4-0005-6), and it compiles and
passes tests on a Debian aarch64 system with gcc 8.3. I'll put that
aside for later. v4-0001-3 are still the main focus now, and seem in
decent shape, maybe needs a bit more polish. (not to mention formal
commit messages)

--
John Naylor
Amazon Web Services

Attachments:

v4-0002-Centralize-detection-of-CPU-features.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Centralize-detection-of-CPU-features.patchDownload+107-125
v4-0001-Rename-CRC-choose-files-for-future-general-purpos.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Rename-CRC-choose-files-for-future-general-purpos.patchDownload+12-9
v4-0003-Refactor-the-detection-of-ZMM-registers.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Refactor-the-detection-of-ZMM-registers.patchDownload+24-23
v4-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Enable-autovectorizing-page-checksums-with-AVX2-w.patchDownload+224-36
v4-0005-Rename-CRC-Arm-v8-choose-file-for-future-general-.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Rename-CRC-Arm-v8-choose-file-for-future-general-.patchDownload+10-5
v4-0006-Centralize-detection-of-CPU-features-Arm-take-1.patchtext/x-patch; charset=US-ASCII; name=v4-0006-Centralize-detection-of-CPU-features-Arm-take-1.patchDownload+80-66
#7Zsolt Parragi
zsolt.parragi@percona.com
In reply to: John Naylor (#6)
Re: centralize CPU feature detection

1-3 looks good to me, other than the need for the proper commit messages.

There's a typo in " Are ZMM registeres enabled?" in 3.

#8John Naylor
john.naylor@enterprisedb.com
In reply to: Zsolt Parragi (#7)
Re: centralize CPU feature detection

On Fri, Feb 20, 2026 at 3:26 PM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

1-3 looks good to me, other than the need for the proper commit messages.

I've committed 0001.

There's a typo in " Are ZMM registeres enabled?" in 3.

Fixed. 0002 and 0003 are attached with draft commit messages. There
was also a cosmetic mistake in an enum member name, whose correction
was squashed in the wrong direction -- this has been fixed. I also
added the new typedef for pgindent and restored a lost comment for
pg_comp_crc32c_choose(). This seems committable, but will double-check
everything works correctly.

--
John Naylor
Amazon Web Services

Attachments:

v5-0002-Centralize-detection-of-x86-CPU-features.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Centralize-detection-of-x86-CPU-features.patchDownload+113-125
v5-0003-Refactor-detection-of-x86-ZMM-registers.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Refactor-detection-of-x86-ZMM-registers.patchDownload+22-21
#9Zsolt Parragi
zsolt.parragi@percona.com
In reply to: John Naylor (#8)
Re: centralize CPU feature detection

2 and 3 looks good too, I only found two more typos:

+ return pg_comp_crc32c(crc, data, len);
+};

That semicolon is not needed

And in the commit message:

"it has been intialized and if"

That should be initialized

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#8)
Re: centralize CPU feature detection

John Naylor <johncnaylorls@gmail.com> writes:

I've committed 0001.

BF animal rhinoceros isn't happy. I can reproduce that locally by
doing

$ ./configure ... USE_SLICING_BY_8_CRC32C=1
$ make
...
pg_cpu_x86.c: In function 'pg_comp_crc32c_choose':
pg_cpu_x86.c:85:3: error: 'pg_comp_crc32c' undeclared (first use in this function); did you mean 'pg_comp_crc32c_sb8'?
pg_comp_crc32c = pg_comp_crc32c_sse42;
^~~~~~~~~~~~~~
pg_comp_crc32c_sb8
pg_cpu_x86.c:85:3: note: each undeclared identifier is reported only once for each function it appears in
pg_cpu_x86.c:85:20: error: 'pg_comp_crc32c_sse42' undeclared (first use in this function); did you mean 'pg_comp_crc32c_sb8'?
pg_comp_crc32c = pg_comp_crc32c_sse42;
^~~~~~~~~~~~~~~~~~~~
pg_comp_crc32c_sb8
pg_cpu_x86.c:108:9: warning: implicit declaration of function 'pg_comp_crc32c'; did you mean 'pg_comp_crc32c_sb8'? [-Wimplicit-function-declaration]
return pg_comp_crc32c(crc, data, len);
^~~~~~~~~~~~~~
pg_comp_crc32c_sb8

It appears that if you want to build pg_cpu_x86.o unconditionally,
you need to make it more proof against the cases it wasn't getting
built in before.

regards, tom lane

#11John Naylor
john.naylor@enterprisedb.com
In reply to: Tom Lane (#10)
Re: centralize CPU feature detection

On Wed, Feb 25, 2026 at 2:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It appears that if you want to build pg_cpu_x86.o unconditionally,
you need to make it more proof against the cases it wasn't getting
built in before.

Thanks, I must have stopped watching the buildfarm too early. I've
pushed a fix which will get undone as part of v6-0002.

On Wed, Feb 25, 2026 at 2:57 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

2 and 3 looks good too, I only found two more typos:

+ return pg_comp_crc32c(crc, data, len);
+};

That semicolon is not needed

And in the commit message:

"it has been intialized and if"

That should be initialized

Also fixed, thanks.

--
John Naylor
Amazon Web Services

Attachments:

v6-0002-Centralize-detection-of-x86-CPU-features.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Centralize-detection-of-x86-CPU-features.patchDownload+112-128
v6-0003-Refactor-detection-of-x86-ZMM-registers.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Refactor-detection-of-x86-ZMM-registers.patchDownload+22-21
#12Zsolt Parragi
zsolt.parragi@percona.com
In reply to: John Naylor (#11)
Re: centralize CPU feature detection

Both look good to me.

This isn't part of the patch, and it seems harmless, but while
reviewing the CRC functions, I noticed that pg_crc32c.h is
inconsistent with its dllimport markers, pg_comp_crc32c has 3
different declarations, and only 1 of them is marked PGDLLIMPORT.

#13John Naylor
john.naylor@enterprisedb.com
In reply to: Zsolt Parragi (#12)
Re: centralize CPU feature detection

On Wed, Feb 25, 2026 at 8:09 PM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

Both look good to me.

Pushed 0002 after making sure AVX-512 detection still worked, thanks
for the review! I think 0003 needs a link to the Intel manual for the
XCR symbol values, and I'll push shortly after I add that.

This isn't part of the patch, and it seems harmless, but while
reviewing the CRC functions, I noticed that pg_crc32c.h is
inconsistent with its dllimport markers, pg_comp_crc32c has 3
different declarations, and only 1 of them is marked PGDLLIMPORT.

Yeah, I think that crept in during development to keep Windows CI
building with a not-for-commit test module. I'll remove it soon.

--
John Naylor
Amazon Web Services