gothic_moth, codlin_moth failures on REL8_2_STABLE

Started by Tom Lanealmost 16 years ago10 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Since the buildfarm is mostly green these days, I took some time to look
into the few remaining consistent failures. One is that gothic_moth and
codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
regression diff like this:

*** 2453,2459 ****
   <body>
   <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
   <a href="http://www.google.com/foo.bar.html" target="_blank">YES &nbsp;</a>
!   ff-bg
   <script>
          document.write(15);
   </script>
--- 2453,2459 ----
   <body>
   <b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
   <a href="http://www.google.com/foo.bar.html" target="_blank">YES &nbsp;</a>
!  ff-bgff-bg
   <script>
          document.write(15);
   </script>

These animals are not testing any branches older than 8.2. The same
test appears in newer branches and passes, but the code involved got
migrated to core and probably changed around a bit.

I traced through this test on my own machine and determined that the
way it's supposed to work is like this: the tsearch parser breaks the
string into a series of tokens that include these:

ff-bg compound word
ff compound word element
- punctuation
bg compound word element

The highlight function is then supposed to set skip = 1 on the compound
word, causing it to be skipped when genhl() reconstructs the text.
The failure looks to me like the compound word is not getting skipped.
Both the setting and the testing of the flag seem to be absolutely
straightforward portable code; although the "skip" struct field is a
bitfield, which is a C feature we don't use very heavily.

My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.

regards, tom lane

#2Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#1)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My conclusion is that this is probably a compiler bug.  Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit.  Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well.  But without access to a machine
showing the problem it is difficult to do much.

Could be this:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087

It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
patch missing on both gothic-moth and codlin-moth?

I suppose it's possible to have a configure test to check for whether
this patch is present but I'm not sure how much it's worthwhile given
that it'll only help people who happen to recompile their 8.2 server
after the next Postgres patch. And I'm not sure we can check for
patches without assuming the CC is the OS-shipped cc. Does cc itself
have an option to list which patches it has applied to it?

--
greg

#3Greg Stark
gsstark@mit.edu
In reply to: Greg Stark (#2)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Incidentally Zdenek came to the same conclusion that it was a compiler
bug in <4AA775A9.80702@sun.com>

--
greg

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#3)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Greg Stark <gsstark@mit.edu> writes:

Incidentally Zdenek came to the same conclusion that it was a compiler
bug in <4AA775A9.80702@sun.com>

Drat, I had forgotten that exchange. I reconstructed Teodor's advice
the hard way :-(

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#2)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Greg Stark <gsstark@mit.edu> writes:

On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My conclusion is that this is probably a compiler bug.

Could be this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087

Hmmm ... that doesn't seem to be quite an exact match, because the
setting and testing of the bitfield is in different functions in
different files in our case. Still, it seems related. It would
be useful to verify whether these two buildfarm animals are fully
up-to-date on compiler patches.

regards, tom lane

#6Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#1)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Hi Tom,

I'm sorry that I did not look on it early. I played with it and there
are some facts. gothic(sparc) and codlin(x86) uses Sun Studio 12 nad I
setup them to use very high optimization.

Gothic:
-------
-xalias_level=basic -xarch=native -xdepend -xmemalign=8s -xO5
-xprefetch=auto,explicit

Codlin:
-------
-xalias_level=basic -xarch=native -xdepend -xO4 -xprefetch=auto,explicit

-xO5 is highest optimization, -xO4 is little bit worse

A play with flags and found that

"-xO4 -xalias_level=basic" generates problem.

"-xO3 -xalias_level=basic" works fine

"-xO5" works fine

As documentation say:

Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

------------------------------------------------------------------------
xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.

For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.

-x04
-----
Preforms automatic inlining of functions contained in the same file in
addition to performing -xO3 optimizations. This automatic inlining
usually improves execution speed, but sometimes makes it worse. In
general, this level results in increased code size.

------------------------------------------------------------------------

I redefined bitfields to char in HLWORD and it works. Your guess is
correct. But question still where is the place when bitfields works bad.
Any idea where I should look?

IIRC, I had this problem also on head, when I tried to fix tsearch
regression test for Czech locale. This problem appears and disappears.

Zdenek

Dne 11.03.10 00:37, Tom Lane napsal(a):

Show quoted text

Since the buildfarm is mostly green these days, I took some time to look
into the few remaining consistent failures. One is that gothic_moth and
codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
regression diff like this:

*** 2453,2459 ****
<body>
<b>Sea</b>  view wow<u><b>foo</b>  bar</u>  <i>qq</i>
<a href="http://www.google.com/foo.bar.html" target="_blank">YES&nbsp;</a>
!   ff-bg
<script>
document.write(15);
</script>
--- 2453,2459 ----
<body>
<b>Sea</b>  view wow<u><b>foo</b>  bar</u>  <i>qq</i>
<a href="http://www.google.com/foo.bar.html" target="_blank">YES&nbsp;</a>
!  ff-bgff-bg
<script>
document.write(15);
</script>

These animals are not testing any branches older than 8.2. The same
test appears in newer branches and passes, but the code involved got
migrated to core and probably changed around a bit.

I traced through this test on my own machine and determined that the
way it's supposed to work is like this: the tsearch parser breaks the
string into a series of tokens that include these:

ff-bg compound word
ff compound word element
- punctuation
bg compound word element

The highlight function is then supposed to set skip = 1 on the compound
word, causing it to be skipped when genhl() reconstructs the text.
The failure looks to me like the compound word is not getting skipped.
Both the setting and the testing of the flag seem to be absolutely
straightforward portable code; although the "skip" struct field is a
bitfield, which is a C feature we don't use very heavily.

My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#6)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fine

As documentation say:

Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.

For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.

I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.

regards, tom lane

#8Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Greg Stark (#2)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Dne 11.03.10 16:24, Greg Stark napsal(a):

On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.

Could be this:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087

It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
patch missing on both gothic-moth and codlin-moth?

It seems as a our case. See compiler versions:

Ghost:
-bash-3.2$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25

Codlin
-bash-4.0$ cc -V
cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30

I should apply patch on Ghost, but Codlin have to wait, because I don't
have a control on compiler version. I try to find update SS12 somewhere
on the disk/network.

The patch which you refer does not fix cc itself but some others
binaries/libs which cc uses.

I try to update Ghost and we will see what happen.

I suppose it's possible to have a configure test to check for whether
this patch is present but I'm not sure how much it's worthwhile given
that it'll only help people who happen to recompile their 8.2 server
after the next Postgres patch. And I'm not sure we can check for
patches without assuming the CC is the OS-shipped cc. Does cc itself
have an option to list which patches it has applied to it?

cc is not shipped with solaris you have to install it separately. And
bug appear only when you use high optimization (see my email). You can
see patch version when you run cc -V but you see only compiler version.

Zdenek

#9Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#7)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Dne 11.03.10 17:37, Tom Lane napsal(a):

Zdenek Kotala<Zdenek.Kotala@Sun.COM> writes:

"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fine

As documentation say:

Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.

For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.

I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.

OK. I first try to update compiler to latest version to see if it helps
and finally I will remove aliasing.

Thanks Zdenek

#10Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#7)
Re: gothic_moth, codlin_moth failures on REL8_2_STABLE

Tom Lane píše v čt 11. 03. 2010 v 11:37 -0500:

Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fine

As documentation say:

Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view

xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.

For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.

I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.

Reconfigured and both animal are green.

Zdenek