[PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

Started by Zidenberg, Tsahiover 5 years ago14 messages
#1Zidenberg, Tsahi
tsahee@amazon.com
1 attachment(s)

Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic instructions. CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions. If a different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g. -march=armv8.2), the outline-atomic flag will have no effect.

The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3% on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only benchmarks were not significantly affected, and neither was performance on a 16-core armv8.0 machine that does not support atomic instructions (AWS a1.4xlarge).

The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for aarch64 machine is not strictly necessary, but was added for readability.

Thank you!
Tsahi

Attachments:

0001-Support-outline-atomics-on-aarch64.patchapplication/octet-stream; name=0001-Support-outline-atomics-on-aarch64.patchDownload
From a876395d5cf36e417e6910bb4b03a4650799218d Mon Sep 17 00:00:00 2001
From: Tsahi Zidenberg <tsahee@amazon.com>
Date: Mon, 22 Jun 2020 08:26:47 +0000
Subject: [PATCH] Support outline-atomics on aarch64

outline-atomics is a gcc compilation flag that enables runtime detection
of CPU support for atomic instructions.
Performance on CPUs that do support atomic instructions is improved,
while compatibility and performance on CPUs without atomic instructions
is not hurt.
---
 configure    | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 configure.in |  4 +++
 2 files changed, 97 insertions(+)

diff --git a/configure b/configure
index 2feff37fe3..f7e1822e9a 100755
--- a/configure
+++ b/configure
@@ -6223,6 +6223,99 @@ fi
   if test -n "$NOT_THE_CFLAGS"; then
     CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  if test x"$host_cpu" == x"aarch64"; then
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -moutline-atomics, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -moutline-atomics, for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__moutline_atomics+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -moutline-atomics"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__moutline_atomics=yes
+else
+  pgac_cv_prog_CC_cflags__moutline_atomics=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__moutline_atomics" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__moutline_atomics" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__moutline_atomics" = x"yes"; then
+  CFLAGS="${CFLAGS} -moutline-atomics"
+fi
+
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -moutline-atomics, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -moutline-atomics, for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__moutline_atomics+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -moutline-atomics"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__moutline_atomics=yes
+else
+  pgac_cv_prog_CXX_cxxflags__moutline_atomics=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+ac_cxx_werror_flag=$ac_save_cxx_werror_flag
+CXXFLAGS="$pgac_save_CXXFLAGS"
+CXX="$pgac_save_CXX"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__moutline_atomics" >&5
+$as_echo "$pgac_cv_prog_CXX_cxxflags__moutline_atomics" >&6; }
+if test x"$pgac_cv_prog_CXX_cxxflags__moutline_atomics" = x"yes"; then
+  CXXFLAGS="${CXXFLAGS} -moutline-atomics"
+fi
+
+
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
diff --git a/configure.in b/configure.in
index 0188c6ff07..3657628399 100644
--- a/configure.in
+++ b/configure.in
@@ -532,6 +532,10 @@ if test "$GCC" = yes -a "$ICC" = no; then
   if test -n "$NOT_THE_CFLAGS"; then
     CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  if test x"$host_cpu" == x"aarch64"; then
+    PGAC_PROG_CC_CFLAGS_OPT([-moutline-atomics])
+    PGAC_PROG_CXX_CFLAGS_OPT([-moutline-atomics])
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
-- 
2.25.1

#2Zidenberg, Tsahi
tsahee@amazon.com
In reply to: Zidenberg, Tsahi (#1)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 01/07/2020, 18:40, "Zidenberg, Tsahi" <tsahee@amazon.com> wrote:

Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu
supports atomic instructions. CPUs that don't support atomic instructions will use the old
load-exclusive/store-exclusive instructions. If a different compilation flag defined an architecture
that unconditionally supports atomic instructions (e.g. -march=armv8.2), the outline-atomic flag
will have no effect.

The patch was tested to improve pgbench simple-update by 10% and sysbench write-only by 3%
on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only benchmarks were
not significantly affected, and neither was performance on a 16-core armv8.0 machine that does
not support atomic instructions (AWS a1.4xlarge).

The patch uses an existing configure.in macro to detect compiler support of the flag. Checking for
aarch64 machine is not strictly necessary, but was added for readability.

Added a commitfest entry:
https://commitfest.postgresql.org/29/2637/

Thank you!
Tsahi

#3Andres Freund
andres@anarazel.de
In reply to: Zidenberg, Tsahi (#1)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

Hi,

On 2020-07-01 15:40:38 +0000, Zidenberg, Tsahi wrote:

Outline-atomics is a gcc compilation flag that adds runtime detection
of weather or not the cpu supports atomic instructions. CPUs that
don't support atomic instructions will use the old
load-exclusive/store-exclusive instructions. If a different
compilation flag defined an architecture that unconditionally supports
atomic instructions (e.g. -march=armv8.2), the outline-atomic flag
will have no effect.

Sounds attractive.

The patch was tested to improve pgbench simple-update by 10% and
sysbench write-only by 3% on a 64-core armv8.2 machine (AWS
m6g.16xlarge). Select-only and read-only benchmarks were not
significantly affected, and neither was performance on a 16-core
armv8.0 machine that does not support atomic instructions (AWS
a1.4xlarge).

What does "not significantly affected" exactly mean? Could you post the
raw numbers? I'm a bit concerned that the additional conditional
branches on platforms without non ll/sc atomics could hurt noticably.

I'm surprised that read-only didn't benefit - with ll/sc that ought to
have pretty high contention on a few lwlocks.

Could you post the numbers?

Greetings,

Andres Freund

#4Zidenberg, Tsahi
tsahee@amazon.com
In reply to: Andres Freund (#3)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

Hello!

First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox.

On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote:

What does "not significantly affected" exactly mean? Could you post the
raw numbers?

The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support)
and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4.
Tests are pgbench select-only/simple-update, and sysbench read-only/write only.

. select-only. simple-update. read-only. write-only
m6g.8xlarge/vanila. 482130. 56275. 273327. 33364
m6g.8xlarge/patch. 493748. 59681. 262702. 33024
a1.4xlarge/vanila. 82437. 13978. 62489. 2928
a1.4xlarge/patch. 79499. 13932. 62796. 2945

Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison,
But I don't think these numbers should be taken as absolute.
As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances:

m6g.8xlarge/native. 522771. 60354. 261366. 33582
m5.8xlarge. 362908. 58732. 147730. 32750

I'm a bit concerned that the additional conditional
branches on platforms without non ll/sc atomics could hurt noticably.

As can be seen in a1 results - the difference for CPUSs with no LSE atomic support is low.
Locks have one branch added, which is always taken the same way and thus easy to predict.

I'm surprised that read-only didn't benefit - with ll/sc that ought to
have pretty high contention on a few lwlocks.

These results show only about 6% performance increase in simple-update, and very close
performance in other results, most of which could be attributed to benchmark result jitter.
These results from "well behaved" benchmarks do not show the full importance of using
outline-atomics. I have observed in some experiments with other values and larger systems
a crush of performance including read-only tests, which was caused by continuously failing to
commit strx instructions. In such cases, outline-atomics improved performance by more
than 2x factor. These cases are not always easy to replicate.

Thank you!
and sorry again for the delay
Tsahi Zidenberg

#5Michael Paquier
michael@paquier.xyz
In reply to: Zidenberg, Tsahi (#4)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On Sun, Sep 06, 2020 at 09:00:02PM +0000, Zidenberg, Tsahi wrote:

These results show only about 6% performance increase in simple-update, and very close
performance in other results, most of which could be attributed to benchmark result jitter.
These results from "well behaved" benchmarks do not show the full importance of using
outline-atomics. I have observed in some experiments with other values and larger systems
a crush of performance including read-only tests, which was caused by continuously failing to
commit strx instructions. In such cases, outline-atomics improved performance by more
than 2x factor. These cases are not always easy to replicate.

Interesting stuff. ARM-related otimizations is not something you see
a lot around here. Please note that your latest patch fails to apply,
could you provide a rebase? I am marking the patch as waiting on
author for the time being.
--
Michael

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zidenberg, Tsahi (#1)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

"Zidenberg, Tsahi" <tsahee@amazon.com> writes:

Outline-atomics is a gcc compilation flag that adds runtime detection of weather or not the cpu supports atomic instructions. CPUs that don't support atomic instructions will use the old load-exclusive/store-exclusive instructions. If a different compilation flag defined an architecture that unconditionally supports atomic instructions (e.g. -march=armv8.2), the outline-atomic flag will have no effect.

I wonder what version of gcc you intend this for. AFAICS, older
gcc versions lack this flag at all, while newer ones have it on
by default. Docs I can find on the net suggest that it would only
help to supply the flag when using gcc 10.0.x. Is there a sufficient
population of production systems using such gcc releases to make it
worth expending configure cycles on? (That's sort of a trick question,
because the GCC docs make it look like 10.0.x was never considered
to be production ready.)

regards, tom lane

#7Zidenberg, Tsahi
tsahee@amazon.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 07/09/2020, 6:11, "Michael Paquier" <michael@paquier.xyz> wrote:

Interesting stuff. ARM-related otimizations is not something you see
a lot around here.

Let's hope that will change :)

could you provide a rebase? I am marking the patch as waiting on
author for the time being.

Of course. Attached.

--
Thank you!
Tsahi.

Attachments:

v2-0001-Support-outline-atomics-on-aarch64.patchapplication/octet-stream; name=v2-0001-Support-outline-atomics-on-aarch64.patchDownload
From 1974532ca5223adc1615e0f81c0cc330ce4a41ca Mon Sep 17 00:00:00 2001
From: Tsahi Zidenberg <tsahee@amazon.com>
Date: Mon, 22 Jun 2020 08:26:47 +0000
Subject: [PATCH v2] Support outline-atomics on aarch64

outline-atomics is a gcc compilation flag that enables runtime detection
of CPU support for atomic instructions.
Performance on CPUs that do support atomic instructions is improved,
while compatibility and performance on CPUs without atomic instructions
is not hurt.
---
 configure    | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 configure.ac |  4 +++
 2 files changed, 97 insertions(+)

diff --git a/configure b/configure
index 19a3cd09a0..75c1ffc038 100755
--- a/configure
+++ b/configure
@@ -6319,6 +6319,99 @@ fi
   if test -n "$NOT_THE_CFLAGS"; then
     CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  if test x"$host_cpu" == x"aarch64"; then
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -moutline-atomics, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -moutline-atomics, for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__moutline_atomics+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -moutline-atomics"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__moutline_atomics=yes
+else
+  pgac_cv_prog_CC_cflags__moutline_atomics=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__moutline_atomics" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__moutline_atomics" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__moutline_atomics" = x"yes"; then
+  CFLAGS="${CFLAGS} -moutline-atomics"
+fi
+
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -moutline-atomics, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -moutline-atomics, for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__moutline_atomics+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS} -moutline-atomics"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__moutline_atomics=yes
+else
+  pgac_cv_prog_CXX_cxxflags__moutline_atomics=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+ac_cxx_werror_flag=$ac_save_cxx_werror_flag
+CXXFLAGS="$pgac_save_CXXFLAGS"
+CXX="$pgac_save_CXX"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__moutline_atomics" >&5
+$as_echo "$pgac_cv_prog_CXX_cxxflags__moutline_atomics" >&6; }
+if test x"$pgac_cv_prog_CXX_cxxflags__moutline_atomics" = x"yes"; then
+  CXXFLAGS="${CXXFLAGS} -moutline-atomics"
+fi
+
+
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
diff --git a/configure.ac b/configure.ac
index 6b9d0487a8..434313db5c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -538,6 +538,10 @@ if test "$GCC" = yes -a "$ICC" = no; then
   if test -n "$NOT_THE_CFLAGS"; then
     CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  if test x"$host_cpu" == x"aarch64"; then
+    PGAC_PROG_CC_CFLAGS_OPT([-moutline-atomics])
+    PGAC_PROG_CXX_CFLAGS_OPT([-moutline-atomics])
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
-- 
2.25.1

#8Zidenberg, Tsahi
tsahee@amazon.com
In reply to: Tom Lane (#6)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I wonder what version of gcc you intend this for. AFAICS, older
gcc versions lack this flag at all, while newer ones have it on
by default.

(previously sent private reply, sorry)

The moutline-atomics flag showed substantial enough improvements
that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
the works.
Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
with other distributions including Ubuntu 18.04 and Debian on the way.
all distributions, including the upcoming Ubuntu with GCC-10, have
moutline-atomics turned off by default.

#9Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Zidenberg, Tsahi (#8)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 10/09/2020 09:37, Zidenberg, Tsahi wrote:

On 08/09/2020, 1:01, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I wonder what version of gcc you intend this for. AFAICS, older
gcc versions lack this flag at all, while newer ones have it on
by default.

(previously sent private reply, sorry)

The moutline-atomics flag showed substantial enough improvements
that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
the works.
Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
with other distributions including Ubuntu 18.04 and Debian on the way.
all distributions, including the upcoming Ubuntu with GCC-10, have
moutline-atomics turned off by default.

If it's a good idea to use -moutline-atomics, I would expect the
compiler or distribution to enable it by default. And as you pointed
out, many have. For the others, there are probably reasons they haven't,
like begin conservative in general. Whatever the reasons, IMHO we should
not second-guess them.

I'm marking this as Rejected in the commitfest. But thanks for the
benchmarking, that is valuable information nevertheless.

- Heikki

#10Zidenberg, Tsahi
tsahee@amazon.com
In reply to: Heikki Linnakangas (#9)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:

If it's a good idea to use -moutline-atomics, I would expect the
compiler or distribution to enable it by default. And as you pointed
out, many have.

-moutline-atomics is only enabled by default on the gcc-10 branch where
it was originally developed. It was important enough to be backported
to previous versions and picked up by e.g. ubuntu and amazon-linux.
However, outline-atomics is not enabled by default in any backports that
I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10,
which seems like a compatibility step with the main gcc-9 compiler.
Always-enabled outline-atomic is, sadly, many years in the
future for release systems.

For the others, there are probably reasons they haven't,
like begin conservative in general. Whatever the reasons, IMHO we should
not second-guess them.

I assume GCC chose conservatively not to add code by default that
won't help old CPUs when increasing minor versions (although I see
no performance degradation in real software).
On the other hand, the feature was important enough to be
back-ported to allow software to take advantage of it.
Postgresql should be the most advanced open source database.
As I understand it, it should be able to handle as well as possible
large workloads on large modern machines like Graviton2, and
that means using LSE.

I'm marking this as Rejected in the commitfest. But thanks for the
benchmarking, that is valuable information nevertheless.

Could additional data change your mind?

#11Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Zidenberg, Tsahi (#10)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

On 30/09/2020 19:04, Zidenberg, Tsahi wrote:

Ubuntu 20.04 even turned it off by default for gcc-10, which seems
like a compatibility step with the main gcc-9 compiler.

Ok, I definitely don't want to override that decision.

I'm marking this as Rejected in the commitfest. But thanks for the
benchmarking, that is valuable information nevertheless.

Could additional data change your mind?

I doubt it. IMO we shouldn't second-guess decisions made by compiler and
distribution vendors, and more performance data won't change that
principle. For comparison, we also don't set -O or -funroll-loops or any
other flags to enable/disable specific optimizations. (Except for a few
specific source files, like checksum.c, but those are exceptions the rule.)

If some other committer thinks differently, I won't object, but I'm not
going to commit this. Maybe you should speak to the distribution vendors
or the folk packaging PostgreSQL for those distributions, instead.

- Heikki

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zidenberg, Tsahi (#10)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

"Zidenberg, Tsahi" <tsahee@amazon.com> writes:

On 29/09/2020, 10:21, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote:

If it's a good idea to use -moutline-atomics, I would expect the
compiler or distribution to enable it by default. And as you pointed
out, many have.

-moutline-atomics is only enabled by default on the gcc-10 branch where
it was originally developed. It was important enough to be backported
to previous versions and picked up by e.g. ubuntu and amazon-linux.
However, outline-atomics is not enabled by default in any backports that
I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10,
which seems like a compatibility step with the main gcc-9 compiler.
Always-enabled outline-atomic is, sadly, many years in the
future for release systems.

I don't find this argument terribly convincing. I agree that it'll be
a couple years before gcc 10 is in use in "stable production" systems.
But it seems to me that big-iron aarch64 is also some way off from
appearing in stable production systems. By the time this actually
matters to any measurable fraction of our users, distros will have
converged on reasonable default settings for this option.

In the meantime, you are asking that we more or less permanently expend
configure cycles on checking for an option that seems to have a pretty
short expected useful life, even on the small minority of builds where
it'll do anything at all. The cost/benefit ratio doesn't seem very
attractive.

None of this prevents somebody from applying the switch in their own
builds, of course. But I concur with Heikki's reasoning that it's
probably not a great idea for us to, by default, override the distro's
default on this.

regards, tom lane

#13Alexander Korotkov
aekorotkov@gmail.com
In reply to: Zidenberg, Tsahi (#4)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

Hi!

On Mon, Sep 7, 2020 at 1:12 AM Zidenberg, Tsahi <tsahee@amazon.com> wrote:

First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox.

On 24/07/2020, 4:17, "Andres Freund" <andres@anarazel.de> wrote:

What does "not significantly affected" exactly mean? Could you post the
raw numbers?

The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support)
and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4.
Tests are pgbench select-only/simple-update, and sysbench read-only/write only.

. select-only. simple-update. read-only. write-only
m6g.8xlarge/vanila. 482130. 56275. 273327. 33364
m6g.8xlarge/patch. 493748. 59681. 262702. 33024
a1.4xlarge/vanila. 82437. 13978. 62489. 2928
a1.4xlarge/patch. 79499. 13932. 62796. 2945

Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison,
But I don't think these numbers should be taken as absolute.
As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances:

m6g.8xlarge/native. 522771. 60354. 261366. 33582
m5.8xlarge. 362908. 58732. 147730. 32750

I'd like to resurrect this thread, if there is still interest from
your side. What number of clients and jobs did you use with pgbench?

I've noticed far more dramatic effect on high number of clients.
Could you verify that?
https://akorotkov.github.io/blog/2021/04/30/arm/

------
Regards,
Alexander Korotkov

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Alexander Korotkov (#13)
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

Hi,

FYI, people interested in this thread might be interested in
pgsql-bugs #18610. There are two related issues here:

1. Some people want to use LSE on modern ARM servers so they want to
use -moutline-atomics, which IIUC adds auto-detection logic with
fallback code so it can still run on the first generation of aarch64
ARMv8 (without .1) hardware. That was being discussed as a feature
proposal for master. (People could already access that if they
compile from source by using -march=something_modern, but the big
distributions are in charge of what they target and AFAIK mostly still
choose ARMv8, so this outline atomics idea is a nice workaround to
make everyone happy, I haven't studied exactly how it works.)

2. Clang has started assuming -moutline-atomics in some version, so
it's already compiling .bc files that way, so it breaks if our JIT
system decides to inline SQL-callable functions, so we'll need to
decide what to do about that and back-patch something. Conservative
choice would be to stop it from doing that with -mno-outline-atomics,
until this thread makes progress, but perhaps people closer to the
subject have another idea...

[1]: /messages/by-id/18610-37bf303f904fede3@postgresql.org