Broken build on macOS (Universal / Intel): cpuid instruction not available
Hello!
We have been investigating recent build failures of the master branch.
Currently both Intel and Universal builds are broken on macOS.
We tested building on macOS 26.4.1 with Xcode 26.2 and on macOS 14.5.
Universal builds
============
Universal builds are broken since this commit:
16743db: Centralize detection of x86 CPU features
To reproduce:
export CFLAGS="-arch arm64 -arch x86_64"
./configure --without-icu
make
This results in an error "cpuid instruction not available"
Universal builds used to work; they are broken since commit 16743db.
Intel Builds
========
Intel-only builds (using Rosetta) are also broken in master since the following commit:
5e13b0f: Use AVX2 for calculating page checksums where available
To reproduce:
arch -x86_64 zsh
./configure --without-icu
make
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'
Best regards,
Jakob Egger and Tobias Bussmann
On Thu, May 7, 2026 at 6:41 PM Jakob Egger <jakob@eggerapps.at> wrote:
Universal builds
============
This results in an error "cpuid instruction not available"
Hmm, I imagine that may not work on normal Intel builds either, but
maybe it didn't get that far.
Intel Builds
========Intel-only builds (using Rosetta) are also broken in master since the following commit:
5e13b0f: Use AVX2 for calculating page checksums where available
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'
That's strange. I have an Intel MacBook laying around -- I'll see what I can do.
--
John Naylor
Amazon Web Services
On 7 May 2026, at 14:42, John Naylor <johncnaylorls@gmail.com> wrote:
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'That's strange. I have an Intel MacBook laying around -- I'll see what I can do.
I use an Intel MacBook Pro as my daily driver (currently running macOS 14.7)
and have not had any issues.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 7 May 2026, at 14:42, John Naylor <johncnaylorls@gmail.com> wrote:
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'
That's strange. I have an Intel MacBook laying around -- I'll see what I can do.
I use an Intel MacBook Pro as my daily driver (currently running macOS 14.7)
and have not had any issues.
BF member longfin is an Intel Mac Mini and has not shown issues
(although I think it's one macOS rev behind). I can believe that
universal builds are busted, because we don't test that. But
claiming that a plain Intel build is broken is contrary to
available evidence.
regards, tom lane
Am 07.05.2026 um 15:59 schrieb Tom Lane <tgl@sss.pgh.pa.us>:
Daniel Gustafsson <daniel@yesql.se> writes:
On 7 May 2026, at 14:42, John Naylor <johncnaylorls@gmail.com> wrote:
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'That's strange. I have an Intel MacBook laying around -- I'll see what I can do.
I use an Intel MacBook Pro as my daily driver (currently running macOS 14.7)
and have not had any issues.BF member longfin is an Intel Mac Mini and has not shown issues
(although I think it's one macOS rev behind). I can believe that
universal builds are busted, because we don't test that. But
claiming that a plain Intel build is broken is contrary to
available evidence.
I've set up a fresh VM with macOS 26.4.1 and command line tools.
Universal builds are still broken, but Intel builds (with Rosetta) work in the new VM.
So the intel issue might be something unique to my machine. I''ll try to figure it out.
Jakob
Am 07.05.2026 um 15:59 schrieb Tom Lane <tgl@sss.pgh.pa.us>:
I can believe that
universal builds are busted, because we don't test that.
I've already noticed that there isn't much variety in the buildfarm animals
available for Darwin. Given the availability of a fairly beefy macOS VM server
here, I've considered hosting some VMs to provide coverage of things like
universal builds and cross-compilation using different versions of the Apple
toolkits for the build farm. This issue provides the motivation to take it further.
regards
Tobias
Tobias Bussmann <t.bussmann@gmx.net> writes:
I've already noticed that there isn't much variety in the buildfarm animals
available for Darwin. Given the availability of a fairly beefy macOS VM server
here, I've considered hosting some VMs to provide coverage of things like
universal builds and cross-compilation using different versions of the Apple
toolkits for the build farm. This issue provides the motivation to take it further.
+1. If you're relying on such edge cases working, hosting a buildfarm
animal is by far the best way to ensure they keep working.
regards, tom lane
Am 07.05.2026 um 16:38 schrieb Jakob Egger <jakob@eggerapps.at>:
Am 07.05.2026 um 15:59 schrieb Tom Lane <tgl@sss.pgh.pa.us>:
Daniel Gustafsson <daniel@yesql.se> writes:
On 7 May 2026, at 14:42, John Naylor <johncnaylorls@gmail.com> wrote:
This results in an error:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'That's strange. I have an Intel MacBook laying around -- I'll see what I can do.
I use an Intel MacBook Pro as my daily driver (currently running macOS 14.7)
and have not had any issues.BF member longfin is an Intel Mac Mini and has not shown issues
(although I think it's one macOS rev behind). I can believe that
universal builds are busted, because we don't test that. But
claiming that a plain Intel build is broken is contrary to
available evidence.I've set up a fresh VM with macOS 26.4.1 and command line tools.
Universal builds are still broken, but Intel builds (with Rosetta) work in the new VM.
So the intel issue might be something unique to my machine. I''ll try to figure it out.Jakob
The Intel build error only occurs when building in Rosetta with Xcode 26.2.
I've switched to building with Command Line Tools 26.4 and it builds fine.
The Universal build issue on the other hand is reproducible across multiple Xcode and macOS versions.
Jakob
Jakob Egger <jakob@eggerapps.at> writes:
Universal builds are broken since this commit:
16743db: Centralize detection of x86 CPU features
To reproduce:
export CFLAGS="-arch arm64 -arch x86_64"
./configure --without-icu
make
This results in an error "cpuid instruction not available"
I can reproduce this here. But after looking at it briefly,
I think it's purely accidental that universal builds ever worked,
and they could have done so only for small values of "work".
The fundamental problem is that we make only one probe at
configure time to set flags such as HAVE__GET_CPUID.
With a single arch selected in CFLAGS, you get the appropriate
answer for that arch, and everything's fine. With both selected,
one build or the other will fail such probes with errors like
In file included from conftest.c:135:
/Library/Developer/CommandLineTools/usr/lib/clang/21/include/cpuid.h:14:2: erro\
r: this header is for x86 only
14 | #error this header is for x86 only
| ^
so that we end up with essentially no CPU-specific optimizations
enabled. That's not a place that you really want to be. (I've
not checked into how s_lock.h manages to work at all under these
conditions, but maybe it ends up picking a compiler-intrinsic
implementation?)
The proximate reason that it broke is that v18 and before had
code like
#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_X86_64 1
#endif
#endif
and didn't try to compile the cpuid-dependent function unless
TRY_POPCNT_X86_64 was set. The code in HEAD doesn't have
that guard, and is essentially assuming that every x86 platform
wil provide HAVE__GET_CPUID or HAVE__CPUID.
To make this actually work well, we'd have to do two sets of configure
probes, one for each arch, and somehow apply the correct set of
#defines depending on arch. That's a lot of work that I personally
have no interest in doing, seeing that the handwriting is on the wall
for Apple's support of x86.
I wonder whether we shouldn't just disclaim support for multi-arch
builds. If it's easy to un-break, then sure we might as well restore
the status quo ante, but I don't think it's worth putting a ton of
work into.
regards, tom lane
On Thu, May 07, 2026 at 11:48:04AM -0400, Tom Lane wrote:
To make this actually work well, we'd have to do two sets of configure
probes, one for each arch, and somehow apply the correct set of
#defines depending on arch. That's a lot of work that I personally
have no interest in doing, seeing that the handwriting is on the wall
for Apple's support of x86.I wonder whether we shouldn't just disclaim support for multi-arch
builds. If it's easy to un-break, then sure we might as well restore
the status quo ante, but I don't think it's worth putting a ton of
work into.
+1. I'm not mortally opposed to the idea if someone wants to do the work,
but at the moment I can't get excited about it. Perhaps multi-arch builds
will become more common down the road, though.
--
nathan
Import Notes
Reply to msg id not found: 871806.1778168884@sss.pgh.pa.us
I wrote:
... The code in HEAD doesn't have
that guard, and is essentially assuming that every x86 platform
wil provide HAVE__GET_CPUID or HAVE__CPUID.
Independently of whether macOS multi-arch is something we consider
supportable, I think the aforesaid assumption is a bad idea.
Can't we make pg_cpuid() return zeroes if it doesn't know how to
get the info, analogously to what pg_cpuid_subleaf() does?
regards, tom lane
On Thu, May 7, 2026 at 9:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
... The code in HEAD doesn't have
that guard, and is essentially assuming that every x86 platform
wil provide HAVE__GET_CPUID or HAVE__CPUID.Independently of whether macOS multi-arch is something we consider
supportable, I think the aforesaid assumption is a bad idea.
Can't we make pg_cpuid() return zeroes if it doesn't know how to
get the info, analogously to what pg_cpuid_subleaf() does?
Having worked in that area in this cycle, I think returning zeroes (or
adding a return value that confirms we got data) could work for the
TSC related functionality, since we just fall back to the system clock
and disallow TSC use if we can't get CPUID data. I'll let John confirm
if there are any other optimizations that require CPUID data.
CCing Andres as well, since he reviewed some of those patches for pg_cpu_x86.c.
Thanks,
Lukas
--
Lukas Fittl
Import Notes
Reply to msg id not found: 873909.1778170924@sss.pgh.pa.us
On Fri, May 8, 2026 at 12:34 AM Lukas Fittl <lukas@fittl.com> wrote:
On Thu, May 7, 2026 at 9:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
... The code in HEAD doesn't have
that guard, and is essentially assuming that every x86 platform
wil provide HAVE__GET_CPUID or HAVE__CPUID.Independently of whether macOS multi-arch is something we consider
supportable, I think the aforesaid assumption is a bad idea.
Can't we make pg_cpuid() return zeroes if it doesn't know how to
get the info, analogously to what pg_cpuid_subleaf() does?
Yeah, that seems like a good assumption to remove.
Jakob and Tobias, how far do you get with the attached, at least for
the target x86 case? And it might help if you sent the configure
output in a file so we can see what we're dealing with. Note also that
we have two build systems, since we'll likely transition away from
autotools:
https://wiki.postgresql.org/wiki/Meson
Having worked in that area in this cycle, I think returning zeroes (or
adding a return value that confirms we got data)
Speaking of return values, the TSC feature as committed doesn't
actually use the return value of pg_cpuid_subleaf(), but it seems like
something we could use someday. I don't see such a use for pg_cpuid()
beyond having zero values in the 4 variables.
could work for the
TSC related functionality, since we just fall back to the system clock
and disallow TSC use if we can't get CPUID data. I'll let John confirm
if there are any other optimizations that require CPUID data.
Yes there are, but everything should still fallback gracefully without it.
--
John Naylor
Amazon Web Services
Attachments:
dont-assume-cpuid-available.patchtext/x-patch; charset=US-ASCII; name=dont-assume-cpuid-available.patchDownload+1-2
Am 08.05.2026 um 05:48 schrieb John Naylor <johncnaylorls@gmail.com>:
Jakob and Tobias, how far do you get with the attached, at least for
the target x86 case?
thanks! I tried the patch and it fixes the universal build that broke with
16743db (and make check passes for both architectures). It remains to be
analysed how useful these universal builds are given the lack of
optimisations for one of the architectures, but at least they are possible
again, as they were previously.
And it might help if you sent the configure
output in a file so we can see what we're dealing with.
Please find the configure output attached.
Note also that
we have two build systems, since we'll likely transition away from
autotools:
sure, I haven't yet evaluated the meson build system but I plan to do so.
Best regards
Tobias
Attachments:
On Fri, May 8, 2026 at 3:26 PM Tobias Bussmann <t.bussmann@gmx.net> wrote:
Am 08.05.2026 um 05:48 schrieb John Naylor <johncnaylorls@gmail.com>:
Jakob and Tobias, how far do you get with the attached, at least for
the target x86 case?thanks! I tried the patch and it fixes the universal build that broke with
16743db (and make check passes for both architectures). It remains to be
Great! I've pushed that fix.
analysed how useful these universal builds are given the lack of
optimisations for one of the architectures, but at least they are possible
again, as they were previously.
Taking a quick look at the configure output you provided, certain
optimizations will be lacking on both architectures:
checking for _mm_crc32_u8 and _mm_crc32_u32... no
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... no
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with
CFLAGS=-march=armv8-a+crc+simd... no
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with
CFLAGS=-march=armv8-a+crc... no
...
checking which CRC-32C implementation to use... slicing-by-8
But compiler builtins seem to work:
checking for builtin __atomic int32 atomic operations... yes
checking for builtin __atomic int64 atomic operations... yes
--
John Naylor
Amazon Web Services
John Naylor <johncnaylorls@gmail.com> writes:
On Fri, May 8, 2026 at 3:26 PM Tobias Bussmann <t.bussmann@gmx.net> wrote:
analysed how useful these universal builds are given the lack of
optimisations for one of the architectures, but at least they are possible
again, as they were previously.
Taking a quick look at the configure output you provided, certain
optimizations will be lacking on both architectures:
Yeah. I think the main practical problem will be the
lowest-common-denominator CRC code. However, AFAICS a universal build
would have been selecting that from day one, so if the performance has
been acceptable so far it won't be any worse in v19.
regards, tom lane
Am 08.05.2026 um 12:04 schrieb John Naylor <johncnaylorls@gmail.com>:
Taking a quick look at the configure output you provided, certain
optimizations will be lacking on both architectures:
Indeed. The universal builds seem to disable optimisations that are supported
natively. I did a comparison on the attached configure outputs with different
CFLAGS on arm64 - the first three ran the compiler in x86_64 emulation. I'd expect
this to be similar to a native x86_64 environment, but need to verify.
1. configure.x86_64.out: -arch x86_64 using rosetta2
2. configure.universal-cross.out: -arch arm64 -arch x86_64 using rosetta2
3. configure.arm64-cross.out: -arch arm64 using rosetta2
4. configure.x86_64-cross.out: -arch x86_64
5. configure.universal.out: -arch arm64 -arch x86_64
6. configure.arm64.out: -arch arm64
The difference on active features in detail are:
Only with arm64 native (6):
* svcnt_x
* pmull and pmull2
Both with arm64 native (6) and cross (3):
* __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS
Only with x86_64 emulation (1):
* assembler supports x86_64 popcntq
* _mm512_popcnt_epi64
* _mm512_clmulepi64_epi128
Both with x86_64 emulation/native (1) and arm64 cross (3) and universal cross (2)
* AVX2 target attribute support
Both with x86_64 cross (4) and emulation (1):
* __get_cpuid
* __get_cpuid_count
* _xgetbv
* _mm_crc32_u8 and _mm_crc32_u32
This results in the following CRC-32 decisions:
CRC-32C implementation:
1. x86_64 emulation: SSE 4.2 with runtime check
2. universal emulation: slicing-by-8
3. arm64 cross emulation: ARMv8 CRC instructions
4. x86_64 cross: SSE 4.2 with runtime check
5. universal: slicing-by-8
6. arm64: ARMv8 CRC instructions
Vectorized CRC-32C:
1. x86_64 emulation: AVX-512 with runtime check
2. universal emulation: none
3. arm64 cross emulation: none
4. x86_64 cross: none
5. universal: none
6. arm64: CRYPTO PMULL with runtime check
The differences between cross compilation and emulation(native) may be
worth looking into.
Cross-compiling arm64 or universal from x86_64 does not work, however:
checksum.c:57:6: error: call to undeclared function 'x86_feature_available'
if (x86_feature_available(PG_AVX2))
^
checksum.c:57:28: error: use of undeclared identifier 'PG_AVX2'
if (x86_feature_available(PG_AVX2))
This may have to do with the detection of "AVX2 target attribute support"
As Sandeep Thakkar just confirmed on the packagers list, this also affects bulding on a native x86_64 host not only an emulated one. The AVX2 target attribute support when ran on x86_64 cross compiling to arm64 (3) (or universal (2)) seems to be wrong.
Best regards
Tobias
Attachments:
Tobias Bussmann <t.bussmann@gmx.net> writes:
Indeed. The universal builds seem to disable optimisations that are supported
natively.
Yes, we realized that upthread. It's been like that since day 1,
not something that's new in v19. I don't see that we're likely
to do anything about it: it'd require a fairly fundamental
re-architecting of our configure tests, and the end result would
only be to improve support for machines that are going to be dead
to macOS within a year.
However, it definitely is a regression that the build fails
altogether. Too bad nobody tried the x86 -> ARM case earlier.
regards, tom lane
Am 02.06.2026 um 16:48 schrieb Tom Lane <tgl@sss.pgh.pa.us>:
Tobias Bussmann <t.bussmann@gmx.net> writes:
Indeed. The universal builds seem to disable optimisations that are supported
natively.Yes, we realized that upthread. It's been like that since day 1,
not something that's new in v19.
I wasn't intending to say that. I was rather wondering if it would be worth to test doing two independent builds and lipo all the binaries and libraries together manually to construct a proper optimised universal build.
However, it definitely is a regression that the build fails
altogether. Too bad nobody tried the x86 -> ARM case earlier.
We already touched the issue briefly before but didn't nail it. I wish I'd have more resources to contribute.
Tobias
I wrote:
However, it definitely is a regression that the build fails
altogether. Too bad nobody tried the x86 -> ARM case earlier.
I replicated this on longfin's host (x86_64 mac mini).
It seems there are two problems:
1. pg_cpu.h believes that x86-specific code can be conditional on
#if defined(USE_SSE2) || defined(__i386__)
but macOS doesn't define __i386__, only __x86_64__. It works
anyway on single-arch builds because the test to set USE_SSE2
succeeds, but not on multi-arch builds.
2. checksum.c believes that it's okay to call x86_feature_available
if USE_AVX2_WITH_RUNTIME_CHECK is set. I didn't track down just
why that's getting set in a multi-arch build when USE_SSE2 is not,
but it is, and that's probably good since it means we get at least
some optimization for x86 Macs. But we have to disregard it when
we're doing the ARM side.
The attached quick hack makes the build work on my machine.
I'm hesitant to shove it into the tree though because I'm
not too certain whether there could be side-effects on
other platforms. I think the way to proceed for now is for
EDB to apply this patch in their build of beta1, and we can
review the patch at leisure afterwards.
regards, tom lane