gothic_moth, codlin_moth failures on REL8_2_STABLE
Since the buildfarm is mostly green these days, I took some time to look
into the few remaining consistent failures. One is that gothic_moth and
codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
regression diff like this:
*** 2453,2459 ****
<body>
<b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
<a href="http://www.google.com/foo.bar.html" target="_blank">YES </a>
! ff-bg
<script>
document.write(15);
</script>
--- 2453,2459 ----
<body>
<b>Sea</b> view wow <u><b>foo</b> bar</u> <i>qq</i>
<a href="http://www.google.com/foo.bar.html" target="_blank">YES </a>
! ff-bgff-bg
<script>
document.write(15);
</script>
These animals are not testing any branches older than 8.2. The same
test appears in newer branches and passes, but the code involved got
migrated to core and probably changed around a bit.
I traced through this test on my own machine and determined that the
way it's supposed to work is like this: the tsearch parser breaks the
string into a series of tokens that include these:
ff-bg compound word
ff compound word element
- punctuation
bg compound word element
The highlight function is then supposed to set skip = 1 on the compound
word, causing it to be skipped when genhl() reconstructs the text.
The failure looks to me like the compound word is not getting skipped.
Both the setting and the testing of the flag seem to be absolutely
straightforward portable code; although the "skip" struct field is a
bitfield, which is a C feature we don't use very heavily.
My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.
regards, tom lane
On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.
Could be this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087
It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
patch missing on both gothic-moth and codlin-moth?
I suppose it's possible to have a configure test to check for whether
this patch is present but I'm not sure how much it's worthwhile given
that it'll only help people who happen to recompile their 8.2 server
after the next Postgres patch. And I'm not sure we can check for
patches without assuming the CC is the OS-shipped cc. Does cc itself
have an option to list which patches it has applied to it?
--
greg
Incidentally Zdenek came to the same conclusion that it was a compiler
bug in <4AA775A9.80702@sun.com>
--
greg
Greg Stark <gsstark@mit.edu> writes:
Incidentally Zdenek came to the same conclusion that it was a compiler
bug in <4AA775A9.80702@sun.com>
Drat, I had forgotten that exchange. I reconstructed Teodor's advice
the hard way :-(
regards, tom lane
Greg Stark <gsstark@mit.edu> writes:
On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My conclusion is that this is probably a compiler bug.
Could be this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087
Hmmm ... that doesn't seem to be quite an exact match, because the
setting and testing of the bitfield is in different functions in
different files in our case. Still, it seems related. It would
be useful to verify whether these two buildfarm animals are fully
up-to-date on compiler patches.
regards, tom lane
Hi Tom,
I'm sorry that I did not look on it early. I played with it and there
are some facts. gothic(sparc) and codlin(x86) uses Sun Studio 12 nad I
setup them to use very high optimization.
Gothic:
-------
-xalias_level=basic -xarch=native -xdepend -xmemalign=8s -xO5
-xprefetch=auto,explicit
Codlin:
-------
-xalias_level=basic -xarch=native -xdepend -xO4 -xprefetch=auto,explicit
-xO5 is highest optimization, -xO4 is little bit worse
A play with flags and found that
"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fine
As documentation say:
Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view
------------------------------------------------------------------------
xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.
For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.
-x04
-----
Preforms automatic inlining of functions contained in the same file in
addition to performing -xO3 optimizations. This automatic inlining
usually improves execution speed, but sometimes makes it worse. In
general, this level results in increased code size.
------------------------------------------------------------------------
I redefined bitfields to char in HLWORD and it works. Your guess is
correct. But question still where is the place when bitfields works bad.
Any idea where I should look?
IIRC, I had this problem also on head, when I tried to fix tsearch
regression test for Czech locale. This problem appears and disappears.
Zdenek
Dne 11.03.10 00:37, Tom Lane napsal(a):
Show quoted text
Since the buildfarm is mostly green these days, I took some time to look
into the few remaining consistent failures. One is that gothic_moth and
codlin_moth fail on contrib/tsearch2 in the 8.2 branch, with a
regression diff like this:*** 2453,2459 **** <body> <b>Sea</b> view wow<u><b>foo</b> bar</u> <i>qq</i> <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a> ! ff-bg <script> document.write(15); </script> --- 2453,2459 ---- <body> <b>Sea</b> view wow<u><b>foo</b> bar</u> <i>qq</i> <a href="http://www.google.com/foo.bar.html" target="_blank">YES </a> ! ff-bgff-bg <script> document.write(15); </script>These animals are not testing any branches older than 8.2. The same
test appears in newer branches and passes, but the code involved got
migrated to core and probably changed around a bit.I traced through this test on my own machine and determined that the
way it's supposed to work is like this: the tsearch parser breaks the
string into a series of tokens that include these:ff-bg compound word
ff compound word element
- punctuation
bg compound word elementThe highlight function is then supposed to set skip = 1 on the compound
word, causing it to be skipped when genhl() reconstructs the text.
The failure looks to me like the compound word is not getting skipped.
Both the setting and the testing of the flag seem to be absolutely
straightforward portable code; although the "skip" struct field is a
bitfield, which is a C feature we don't use very heavily.My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.regards, tom lane
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fine
As documentation say:
Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=view
xalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.
For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.
I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.
regards, tom lane
Dne 11.03.10 16:24, Greg Stark napsal(a):
On Wed, Mar 10, 2010 at 11:37 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
My conclusion is that this is probably a compiler bug. Both buildfarm
animals appear to be using Sun Studio, although on different
architectures which weakens the compiler-bug theory a bit. Even though
we are not seeing the failure in later PG branches, it would probably be
worth looking into more closely, because if it's biting 8.2 it could
start biting newer code as well. But without access to a machine
showing the problem it is difficult to do much.Could be this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6750087
It's fixed in patch 124861-11 which came out Feb 23, 2009. Is this
patch missing on both gothic-moth and codlin-moth?
It seems as a our case. See compiler versions:
Ghost:
-bash-3.2$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25
Codlin
-bash-4.0$ cc -V
cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30
I should apply patch on Ghost, but Codlin have to wait, because I don't
have a control on compiler version. I try to find update SS12 somewhere
on the disk/network.
The patch which you refer does not fix cc itself but some others
binaries/libs which cc uses.
I try to update Ghost and we will see what happen.
I suppose it's possible to have a configure test to check for whether
this patch is present but I'm not sure how much it's worthwhile given
that it'll only help people who happen to recompile their 8.2 server
after the next Postgres patch. And I'm not sure we can check for
patches without assuming the CC is the OS-shipped cc. Does cc itself
have an option to list which patches it has applied to it?
cc is not shipped with solaris you have to install it separately. And
bug appear only when you use high optimization (see my email). You can
see patch version when you run cc -V but you see only compiler version.
Zdenek
Dne 11.03.10 17:37, Tom Lane napsal(a):
Zdenek Kotala<Zdenek.Kotala@Sun.COM> writes:
"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fineAs documentation say:
Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=viewxalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.
OK. I first try to update compiler to latest version to see if it helps
and finally I will remove aliasing.
Thanks Zdenek
Tom Lane píše v čt 11. 03. 2010 v 11:37 -0500:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
"-xO4 -xalias_level=basic" generates problem.
"-xO3 -xalias_level=basic" works fine
"-xO5" works fineAs documentation say:
Cite from Sun studio compiler guide:
http://docs.sun.com/app/docs/doc/819-5265/bjapp?a=viewxalias_level=basic
------------------
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.For example, at the -xalias_level=basic level, the compiler assumes that
a pointer variable of type int * is not going to access a float object.
Therefore it is safe for the compiler to perform optimizations that
assume a pointer of type float * will not alias the same memory that is
referenced with a pointer of type int *.I think you need to turn that off. On gcc we use -fno-strict-aliasing
which disables the type of compiler assumption that this is talking about.
I'm not sure exactly how that might create the specific failure we are
seeing here, but I can point you to lots and lots of places in the
sources where such an assumption would break things.
Reconfigured and both animal are green.
Zdenek