Not-terribly-safe checks for CRC intrinsic support

Started by Tom Lane10 months ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us
1 attachment(s)

I noticed that our configuration-time checks for the presence
of CRC intrinsics generally look like

unsigned int crc = 0;
crc = __crc32cb(crc, 0);
crc = __crc32ch(crc, 0);
crc = __crc32cw(crc, 0);
crc = __crc32cd(crc, 0);
/* return computed value, to prevent the above being optimized away */
return crc == 0;

The trouble with this is that "crc" is a local variable, so the
compiler would be perfectly within its rights to optimize the whole
thing down to "return some_constant". While that outcome sufficiently
proves that the compiler has heard of these intrinsics, it fails to
prove that the platform has any necessary library infrastructure,
assembler support for the opcodes, etc etc. Whoever originally
wrote this evidently had concern for that hazard, or they'd not
have bothered with forcing a dependency on the final value; but
that seems insufficient. We have other nearby tests that try
to avoid this problem by making the functions-under-test operate
on global variables, so I think we should do likewise here.

In connection with bug #18839[1]/messages/by-id/18839-7615d0f8267dc015@postgresql.org, I checked to see if this might
already be happening. At least with gcc 12.2 on armhf Debian,
it doesn't seem to: the compiler still generates the crc opcodes.
But the same compiler is perfectly willing to optimize a call to
sin(3) down to a constant under similar conditions. So I think this
is just a matter of they didn't get round to it, not that there's a
principled reason to think they won't ever get round to it. There
might be other cases where these probes are already missing something,
and we've not noticed because there's-compiler-support-but-no-
library-support is surely a very rare case in the field.

In short, I think we ought to apply and perhaps back-patch something
like the attached.

BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar
hazard, but I'm not entirely sure how to fix that one.

Thoughts?

regards, tom lane

[1]: /messages/by-id/18839-7615d0f8267dc015@postgresql.org

Attachments:

be-more-paranoid-in-CRC-configure-checks.patchtext/x-diff; charset=us-ascii; name=be-more-paranoid-in-CRC-configure-checks.patchDownload
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 8534cc54c13..2d33d919585 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -553,19 +553,23 @@ fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
 # the other ones are, on x86-64 platforms)
 #
 # If the intrinsics are supported, sets pgac_sse42_crc32_intrinsics.
+#
+# To detect the case where the compiler knows the function but library support
+# is missing, we must link not just compile, and store the results in global
+# variables so the compiler doesn't optimize away the call.
 AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS],
 [define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrinsics])])dnl
 AC_CACHE_CHECK([for _mm_crc32_u8 and _mm_crc32_u32], [Ac_cachevar],
 [AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <nmmintrin.h>
+    unsigned int crc;
     #if defined(__has_attribute) && __has_attribute (target)
     __attribute__((target("sse4.2")))
     #endif
     static int crc32_sse42_test(void)
     {
-      unsigned int crc = 0;
       crc = _mm_crc32_u8(crc, 0);
       crc = _mm_crc32_u32(crc, 0);
-      /* return computed value, to prevent the above being optimized away */
+      /* else this function could get optimized away altogether: */
       return crc == 0;
     }],
   [return crc32_sse42_test();])],
@@ -593,13 +597,13 @@ AC_DEFUN([PGAC_ARMV8_CRC32C_INTRINSICS],
 AC_CACHE_CHECK([for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=$1], [Ac_cachevar],
 [pgac_save_CFLAGS=$CFLAGS
 CFLAGS="$pgac_save_CFLAGS $1"
-AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <arm_acle.h>],
-  [unsigned int crc = 0;
-   crc = __crc32cb(crc, 0);
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <arm_acle.h>
+unsigned int crc;],
+  [crc = __crc32cb(crc, 0);
    crc = __crc32ch(crc, 0);
    crc = __crc32cw(crc, 0);
    crc = __crc32cd(crc, 0);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;])],
   [Ac_cachevar=yes],
   [Ac_cachevar=no])
@@ -628,13 +632,12 @@ AC_DEFUN([PGAC_LOONGARCH_CRC32C_INTRINSICS],
 AC_CACHE_CHECK(
   [for __builtin_loongarch_crcc_w_b_w, __builtin_loongarch_crcc_w_h_w, __builtin_loongarch_crcc_w_w_w and __builtin_loongarch_crcc_w_d_w],
   [Ac_cachevar],
-[AC_LINK_IFELSE([AC_LANG_PROGRAM([],
-  [unsigned int crc = 0;
-   crc = __builtin_loongarch_crcc_w_b_w(0, crc);
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([unsigned int crc;],
+  [crc = __builtin_loongarch_crcc_w_b_w(0, crc);
    crc = __builtin_loongarch_crcc_w_h_w(0, crc);
    crc = __builtin_loongarch_crcc_w_w_w(0, crc);
    crc = __builtin_loongarch_crcc_w_d_w(0, crc);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;])],
   [Ac_cachevar=yes],
   [Ac_cachevar=no])])
diff --git a/configure b/configure
index 93fddd69981..1e685514777 100755
--- a/configure
+++ b/configure
@@ -17391,15 +17391,15 @@ else
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <nmmintrin.h>
+    unsigned int crc;
     #if defined(__has_attribute) && __has_attribute (target)
     __attribute__((target("sse4.2")))
     #endif
     static int crc32_sse42_test(void)
     {
-      unsigned int crc = 0;
       crc = _mm_crc32_u8(crc, 0);
       crc = _mm_crc32_u32(crc, 0);
-      /* return computed value, to prevent the above being optimized away */
+      /* else this function could get optimized away altogether: */
       return crc == 0;
     }
 int
@@ -17463,15 +17463,15 @@ CFLAGS="$pgac_save_CFLAGS "
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <arm_acle.h>
+unsigned int crc;
 int
 main ()
 {
-unsigned int crc = 0;
-   crc = __crc32cb(crc, 0);
+crc = __crc32cb(crc, 0);
    crc = __crc32ch(crc, 0);
    crc = __crc32cw(crc, 0);
    crc = __crc32cd(crc, 0);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;
   ;
   return 0;
@@ -17504,15 +17504,15 @@ CFLAGS="$pgac_save_CFLAGS -march=armv8-a+crc+simd"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <arm_acle.h>
+unsigned int crc;
 int
 main ()
 {
-unsigned int crc = 0;
-   crc = __crc32cb(crc, 0);
+crc = __crc32cb(crc, 0);
    crc = __crc32ch(crc, 0);
    crc = __crc32cw(crc, 0);
    crc = __crc32cd(crc, 0);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;
   ;
   return 0;
@@ -17545,15 +17545,15 @@ CFLAGS="$pgac_save_CFLAGS -march=armv8-a+crc"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include <arm_acle.h>
+unsigned int crc;
 int
 main ()
 {
-unsigned int crc = 0;
-   crc = __crc32cb(crc, 0);
+crc = __crc32cb(crc, 0);
    crc = __crc32ch(crc, 0);
    crc = __crc32cw(crc, 0);
    crc = __crc32cd(crc, 0);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;
   ;
   return 0;
@@ -17589,16 +17589,15 @@ if ${pgac_cv_loongarch_crc32c_intrinsics+:} false; then :
 else
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-
+unsigned int crc;
 int
 main ()
 {
-unsigned int crc = 0;
-   crc = __builtin_loongarch_crcc_w_b_w(0, crc);
+crc = __builtin_loongarch_crcc_w_b_w(0, crc);
    crc = __builtin_loongarch_crcc_w_h_w(0, crc);
    crc = __builtin_loongarch_crcc_w_w_w(0, crc);
    crc = __builtin_loongarch_crcc_w_d_w(0, crc);
-   /* return computed value, to prevent the above being optimized away */
+   /* return computed value, just to be extra sure this isn't optimized away */
    return crc == 0;
   ;
   return 0;
diff --git a/meson.build b/meson.build
index 13c13748e5d..599942800fd 100644
--- a/meson.build
+++ b/meson.build
@@ -2323,16 +2323,15 @@ if host_cpu == 'x86' or host_cpu == 'x86_64'
 
     prog = '''
 #include <nmmintrin.h>
-
+unsigned int crc;
 #if defined(__has_attribute) && __has_attribute (target)
 __attribute__((target("sse4.2")))
 #endif
 int main(void)
 {
-    unsigned int crc = 0;
     crc = _mm_crc32_u8(crc, 0);
     crc = _mm_crc32_u32(crc, 0);
-    /* return computed value, to prevent the above being optimized away */
+    /* else this function could get optimized away altogether: */
     return crc == 0;
 }
 '''
@@ -2358,16 +2357,15 @@ elif host_cpu == 'arm' or host_cpu == 'aarch64'
 
   prog = '''
 #include <arm_acle.h>
-
+unsigned int crc;
 int main(void)
 {
-    unsigned int crc = 0;
     crc = __crc32cb(crc, 0);
     crc = __crc32ch(crc, 0);
     crc = __crc32cw(crc, 0);
     crc = __crc32cd(crc, 0);
 
-    /* return computed value, to prevent the above being optimized away */
+    /* return computed value, just to be extra sure this isn't optimized away */
     return crc == 0;
 }
 '''
@@ -2396,15 +2394,15 @@ int main(void)
 elif host_cpu == 'loongarch64'
 
   prog = '''
+unsigned int crc;
 int main(void)
 {
-    unsigned int crc = 0;
     crc = __builtin_loongarch_crcc_w_b_w(0, crc);
     crc = __builtin_loongarch_crcc_w_h_w(0, crc);
     crc = __builtin_loongarch_crcc_w_w_w(0, crc);
     crc = __builtin_loongarch_crcc_w_d_w(0, crc);
 
-    /* return computed value, to prevent the above being optimized away */
+    /* return computed value, just to be extra sure this isn't optimized away */
     return crc == 0;
 }
 '''
#2Steven Niu
niushiji@gmail.com
In reply to: Tom Lane (#1)
Re: Not-terribly-safe checks for CRC intrinsic support

+# is missing, we must link not just compile, and store the results in
global

The "compile" should be "compiler"?

Regards,
Steven

在 2025/3/15 7:04, Tom Lane 写道:

Show quoted text

I noticed that our configuration-time checks for the presence
of CRC intrinsics generally look like

unsigned int crc = 0;
crc = __crc32cb(crc, 0);
crc = __crc32ch(crc, 0);
crc = __crc32cw(crc, 0);
crc = __crc32cd(crc, 0);
/* return computed value, to prevent the above being optimized away */
return crc == 0;

The trouble with this is that "crc" is a local variable, so the
compiler would be perfectly within its rights to optimize the whole
thing down to "return some_constant". While that outcome sufficiently
proves that the compiler has heard of these intrinsics, it fails to
prove that the platform has any necessary library infrastructure,
assembler support for the opcodes, etc etc. Whoever originally
wrote this evidently had concern for that hazard, or they'd not
have bothered with forcing a dependency on the final value; but
that seems insufficient. We have other nearby tests that try
to avoid this problem by making the functions-under-test operate
on global variables, so I think we should do likewise here.

In connection with bug #18839[1], I checked to see if this might
already be happening. At least with gcc 12.2 on armhf Debian,
it doesn't seem to: the compiler still generates the crc opcodes.
But the same compiler is perfectly willing to optimize a call to
sin(3) down to a constant under similar conditions. So I think this
is just a matter of they didn't get round to it, not that there's a
principled reason to think they won't ever get round to it. There
might be other cases where these probes are already missing something,
and we've not noticed because there's-compiler-support-but-no-
library-support is surely a very rare case in the field.

In short, I think we ought to apply and perhaps back-patch something
like the attached.

BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar
hazard, but I'm not entirely sure how to fix that one.

Thoughts?

regards, tom lane

[1] /messages/by-id/18839-7615d0f8267dc015@postgresql.org

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steven Niu (#2)
Re: Not-terribly-safe checks for CRC intrinsic support

Steven Niu <niushiji@gmail.com> writes:

+# is missing, we must link not just compile, and store the results in
global

The "compile" should be "compiler"?

I think it's okay as-is: "link" and "compile" are both being used
as verbs. We could say "run the compiler", but that's longer
without being better.

Besides which, I stole this comment verbatim from elsewhere
in the same file ;-)

regards, tom lane

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Steven Niu (#2)
Re: Not-terribly-safe checks for CRC intrinsic support

On Sunday, March 16, 2025, Steven Niu <niushiji@gmail.com> wrote:

+# is missing, we must link not just compile, and store the results in
global

The "compile" should be "compiler"?

No. Compile is the verb that pairs with link. Compiler is a noun, its
compliment being the linker.

I’d probably add a comma before the “not” though. Or maybe: we must also
link and store the results in global

Doesn’t link imply compilation?

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#4)
Re: Not-terribly-safe checks for CRC intrinsic support

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Sunday, March 16, 2025, Steven Niu <niushiji@gmail.com> wrote:

+# is missing, we must link not just compile, and store the results in
global

I’d probably add a comma before the “not” though. Or maybe: we must also
link and store the results in global

A comma there wouldn't be wrong, but in context that would make for
an overabundance of commas. Or so it seems to me anyway.

Doesn’t link imply compilation?

Yes.

regards, tom lane

#6John Naylor
johncnaylorls@gmail.com
In reply to: Tom Lane (#1)
Re: Not-terribly-safe checks for CRC intrinsic support

On Sat, Mar 15, 2025 at 6:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, I think we ought to apply and perhaps back-patch something
like the attached.

Seems like reasonable defensive coding and consistency.

-    /* return computed value, to prevent the above being optimized away */
+    /* else this function could get optimized away altogether: */
-    /* return computed value, to prevent the above being optimized away */
+    /* return computed value, just to be extra sure this isn't
optimized away */

I'd be okay with keeping the original comment, though, since it seems
to be explaining the choice well enough.

BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar
hazard, but I'm not entirely sure how to fix that one.

"buf" is the variable there that we're loading from, so that would be
the one to make global.

--
John Naylor
Amazon Web Services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#6)
Re: Not-terribly-safe checks for CRC intrinsic support

John Naylor <johncnaylorls@gmail.com> writes:

On Sat, Mar 15, 2025 at 6:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, I think we ought to apply and perhaps back-patch something
like the attached.

Seems like reasonable defensive coding and consistency.

Thanks for looking at it.

I'd be okay with keeping the original comment, though, since it seems
to be explaining the choice well enough.

Okay.

BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar
hazard, but I'm not entirely sure how to fix that one.

"buf" is the variable there that we're loading from, so that would be
the one to make global.

Ah. I was confused by the "const" decoration, but we can remove that.

After thinking for a bit, I pushed this just to master rather than
back-patching. We can do a back-patch if anyone discovers that this
is a live issue on any current platform, but I rather suspect that
it isn't. Compiler not matched to platform is a situation that's
gone away for most people.

regards, tom lane