buildfarm animal shoveler failing with "Illegal instruction"

Started by Andres Freundover 5 years ago6 messages
#1Andres Freund
andres@anarazel.de

Hi Mark,

shoveler has been failing for a while with an odd error. E.g.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shoveler&dt=2020-09-18%2014%3A01%3A48

Illegal instruction
pg_dumpall: error: pg_dump failed on database "template1", exiting
waiting for server to shut down.... done

None of the changes in that time frame look like they're likely causing
illegal instructions to be emitted that weren't before. So I am
wondering if anything changed on that machine around 2020-09-18
14:01:48 ?

Greetings,

Andres Freund

#2Mark Wong
mark@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: buildfarm animal shoveler failing with "Illegal instruction"

On Thu, Oct 01, 2020 at 12:12:44PM -0700, Andres Freund wrote:

Hi Mark,

shoveler has been failing for a while with an odd error. E.g.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shoveler&dt=2020-09-18%2014%3A01%3A48

Illegal instruction
pg_dumpall: error: pg_dump failed on database "template1", exiting
waiting for server to shut down.... done

None of the changes in that time frame look like they're likely causing
illegal instructions to be emitted that weren't before. So I am
wondering if anything changed on that machine around 2020-09-18
14:01:48 ?

It looks like the last package update was 2020-06-10 06:59:26, according
to the apt logs.

I'm getting Tom set up with access too, in case he has time before me to
get a stack trace to see what's happening...

Regards,
Mark

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#2)
Re: buildfarm animal shoveler failing with "Illegal instruction"

Mark Wong <mark@2ndquadrant.com> writes:

I'm getting Tom set up with access too, in case he has time before me to
get a stack trace to see what's happening...

tl;dr: it's hard to conclude that this is anything but a compiler bug.

I was able to reproduce this on shoveler's host, but only when using
the compiler shoveler uses (clang-3.9), not the 6.3 gcc that's also
on there and is of similar vintage. Observations:

* You don't need any complicated test case; "pg_dump template1"
is enough.

* Reverting 1ed6b8956's addition of a "postfix operators are not supported
anymore" warning to dumpOpr() makes it go away. This despite the fact
that that code is never reached when dumping template1. (We do enter
dumpOpr, but the oprinfo->dobj.dump test always fails.)

* Reducing the optimization level to -O1 or -O0 makes it go away.

* Inserting a debugging fprintf in dumpOpr makes it go away.

Since clang 3.9 is several years old, maybe we could move shoveler
up to a newer version? Or dial it down to -O1 optimization?

regards, tom lane

#4Mark Wong
mark@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: buildfarm animal shoveler failing with "Illegal instruction"

On Thu, Oct 01, 2020 at 09:12:53PM -0400, Tom Lane wrote:

Mark Wong <mark@2ndquadrant.com> writes:

I'm getting Tom set up with access too, in case he has time before me to
get a stack trace to see what's happening...

tl;dr: it's hard to conclude that this is anything but a compiler bug.

I was able to reproduce this on shoveler's host, but only when using
the compiler shoveler uses (clang-3.9), not the 6.3 gcc that's also
on there and is of similar vintage. Observations:

* You don't need any complicated test case; "pg_dump template1"
is enough.

* Reverting 1ed6b8956's addition of a "postfix operators are not supported
anymore" warning to dumpOpr() makes it go away. This despite the fact
that that code is never reached when dumping template1. (We do enter
dumpOpr, but the oprinfo->dobj.dump test always fails.)

* Reducing the optimization level to -O1 or -O0 makes it go away.

* Inserting a debugging fprintf in dumpOpr makes it go away.

Since clang 3.9 is several years old, maybe we could move shoveler
up to a newer version? Or dial it down to -O1 optimization?

There is ayu, same system with clang 4.0, so covered on that front.

I went ahead and stopped the jobs to run with clang 3.9. This is also
the same system that was running clang 3.8 too. I tried looking for EOL
dates, but had trouble finding anything... But I can change the
optimization flag if we want it back.

Regards,
Mark
--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

#5Andres Freund
andres@anarazel.de
In reply to: Mark Wong (#4)
Re: buildfarm animal shoveler failing with "Illegal instruction"

On 2020-10-02 10:45:58 -0700, Mark Wong wrote:

I went ahead and stopped the jobs to run with clang 3.9. This is also
the same system that was running clang 3.8 too. I tried looking for EOL
dates, but had trouble finding anything... But I can change the
optimization flag if we want it back.

llvm officially only supports the last minor version, and only does one
or two point releases for them. 3.9 and 3.8 are long past EOL.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: buildfarm animal shoveler failing with "Illegal instruction"

Andres Freund <andres@anarazel.de> writes:

On 2020-10-02 10:45:58 -0700, Mark Wong wrote:

I went ahead and stopped the jobs to run with clang 3.9. This is also
the same system that was running clang 3.8 too. I tried looking for EOL
dates, but had trouble finding anything... But I can change the
optimization flag if we want it back.

llvm officially only supports the last minor version, and only does one
or two point releases for them. 3.9 and 3.8 are long past EOL.

I thought about asking Mark to re-enable it at -O1, but we have recent
experience reminding us that non-default optimization levels are likely
to be even buggier than the default [1]/messages/by-id/1934344.1596305790@sss.pgh.pa.us. So that's probably not a
productive answer. We might as well just retire the animal.

regards, tom lane

[1]: /messages/by-id/1934344.1596305790@sss.pgh.pa.us