test failure with gcc-12 -O3 -march=native

Started by Andres Freundover 3 years ago5 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

For my optimized builds I've long used -O3 -march=native. After one of the
recent package updates (I'm not certain when exactly yet), the main regression
tests started to fail for me with that. Oddly enough in opr_sanity:

 -- Ask access methods to validate opclasses
 -- (this replaces a lot of SQL-level checks that used to be done in this file)
 SELECT oid, opcname FROM pg_opclass WHERE NOT amvalidate(oid);
- oid | opcname
------+---------
-(0 rows)
+INFO:  operator family "array_ops" of access method hash contains function hash_array_extended(anyarray,bigint) with wrong signature for support number 2
+INFO:  operator family "bpchar_ops" of access method hash contains function hashbpcharextended(character,bigint) with wrong signature for support number 2
...
+ 16492 | part_test_int4_ops
+ 16497 | part_test_text_ops
+(43 rows)

Given that I did not encounter this problem with gcc-12 before, and that
gcc-12 has been released, it seems less likely to be a bug in our code
highlighted by a new optimization and more likely to be a bug in a gcc bugfix,
but it's definitely not clear.

I only investigated this a tiny bit so far. What fails is the
procform->prorettype != restype comparison in check_hash_func_signature().

Greetings,

Andres Freund

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#1)
Re: test failure with gcc-12 -O3 -march=native

On Thu, Aug 11, 2022 at 01:03:43PM -0700, Andres Freund wrote:

Hi,

For my optimized builds I've long used -O3 -march=native. After one of the

On what kind of arch ?

Given that I did not encounter this problem with gcc-12 before, and that
gcc-12 has been released, it seems less likely to be a bug in our code
highlighted by a new optimization and more likely to be a bug in a gcc bugfix,
but it's definitely not clear.

debian testing is now defaulting to gcc-12.
https://tracker.debian.org/news/1348007/accepted-gcc-defaults-1198-source-into-unstable/

Are you sure you were building with gcc-12 and not gcc(default) which, until 3
weeks ago, was gcc-11 ?

--
Justin

#3Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#2)
Re: test failure with gcc-12 -O3 -march=native

Hi,

On 2022-08-11 20:06:02 -0500, Justin Pryzby wrote:

On Thu, Aug 11, 2022 at 01:03:43PM -0700, Andres Freund wrote:

Hi,

For my optimized builds I've long used -O3 -march=native. After one of the

On what kind of arch ?

x86-64 cascadelake. I've since debugged this further. It's not even -march
that's the problem, it's the difference between -mtune=broadwell and
-mtune=skylake, even with -march=x86-64.

Given that I did not encounter this problem with gcc-12 before, and that
gcc-12 has been released, it seems less likely to be a bug in our code
highlighted by a new optimization and more likely to be a bug in a gcc bugfix,
but it's definitely not clear.

debian testing is now defaulting to gcc-12.
https://tracker.debian.org/news/1348007/accepted-gcc-defaults-1198-source-into-unstable/

Are you sure you were building with gcc-12 and not gcc(default) which, until 3
weeks ago, was gcc-11 ?

Yes.

I'm now bisecting...

Greetings,

Andres Freund

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: test failure with gcc-12 -O3 -march=native

Hi,

On 2022-08-11 18:24:16 -0700, Andres Freund wrote:

Given that I did not encounter this problem with gcc-12 before, and that
gcc-12 has been released, it seems less likely to be a bug in our code
highlighted by a new optimization and more likely to be a bug in a gcc bugfix,
but it's definitely not clear.

debian testing is now defaulting to gcc-12.
https://tracker.debian.org/news/1348007/accepted-gcc-defaults-1198-source-into-unstable/

Are you sure you were building with gcc-12 and not gcc(default) which, until 3
weeks ago, was gcc-11 ?

Yes.

I'm now bisecting...

I found the commit triggering it [1]https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1ceddd7497e. Oddly it's a change from a few months
ago, and I can reconstruct from dpkg.log and shell history that I definitely
ran the tests many times since upgrading the compiler. I did however clean my
ccache cache yesterday, I wonder if somehow the 'old' version got stuck in
it. ccache says it checks the compiler's mtime though.

Greetings,

Andres Freund

[1]: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1ceddd7497e

#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: test failure with gcc-12 -O3 -march=native

Hi,

On 2022-08-11 19:08:14 -0700, Andres Freund wrote:

On 2022-08-11 18:24:16 -0700, Andres Freund wrote:

I'm now bisecting...

I found the commit triggering it [1]. Oddly it's a change from a few months
ago, and I can reconstruct from dpkg.log and shell history that I definitely
ran the tests many times since upgrading the compiler. I did however clean my
ccache cache yesterday, I wonder if somehow the 'old' version got stuck in
it. ccache says it checks the compiler's mtime though.

Spent a fair bit of time reducing the problem to something triggering the
problem in isolation. This is somewhat scary - I'd be quite surprised if this
were the only place triggering the bug.

And I suspect that it doesn't actually require -mtune=skylake, but I'm not
sure.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Greetings,

Andres Freund