Back branches vs. gcc 4.8.0

Started by Tom Laneabout 13 years ago18 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
/messages/by-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:

int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler. Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member. That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Tom Lane (#1)
Re: Back branches vs. gcc 4.8.0

On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

The traditional argument against that has been that that's a C99
feature. However, since it appears that even MSVC supports flexible
arrays (which are described as a "Microsoft extension", so may not
have identical semantics), it might be possible to do this across the
board without contorting the code with preprocessor hacks. That's
something that I'd certainly be in favor of pursuing.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Tom Lane (#1)
Re: Back branches vs. gcc 4.8.0

On 06/04/13 11:14, Tom Lane wrote:

After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
/messages/by-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:

int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler. Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member. That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

regards, tom lane

I am probably missing something here!

I would have thought it reasonable for a compiler assume 'an array
declared with a fixed size really is fixed.'!

Seems dangerous to play tricks like that (though I admit to doing nasty
things like that in COBOL many many years ago!).

Cheers,
Gavin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#2)
Re: Back branches vs. gcc 4.8.0

Peter Geoghegan <pg@heroku.com> writes:

On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

The traditional argument against that has been that that's a C99
feature.

Well, we already have a solution for that, see FLEXIBLE_ARRAY_MEMBER.
But up to now we've just supposed that that was a code beautification
thing and there was no particular urgency to convert all applicable
places to use that notation.

Since there's a potential to break code with such changes (we'd have to
fix any uses of sizeof on the struct type), it's been very far down the
to-do list. But now it appears that we're taking risks if we *don't*
change it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#2)
Re: Back branches vs. gcc 4.8.0

On 2013-04-05 23:28:03 +0100, Peter Geoghegan wrote:

On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

The traditional argument against that has been that that's a C99
feature. However, since it appears that even MSVC supports flexible
arrays (which are described as a "Microsoft extension", so may not
have identical semantics), it might be possible to do this across the
board without contorting the code with preprocessor hacks. That's
something that I'd certainly be in favor of pursuing.

The respective macro magic is already in place, its just not used in all
places. The problem is more that we can't easily use it in all places
because e.g. in the one case mentioned here the array isn't in the last
place *in the back branches*.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andres Freund (#5)
Re: Back branches vs. gcc 4.8.0

On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The respective macro magic is already in place, its just not used in all
places. The problem is more that we can't easily use it in all places
because e.g. in the one case mentioned here the array isn't in the last
place *in the back branches*.

Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
single place where we currently use the one element array pattern? I
count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
pretty ugly to have that everywhere, in my opinion.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: Back branches vs. gcc 4.8.0

Andres Freund <andres@2ndquadrant.com> writes:

The respective macro magic is already in place, its just not used in all
places. The problem is more that we can't easily use it in all places
because e.g. in the one case mentioned here the array isn't in the last
place *in the back branches*.

I don't think we should try to back-patch such changes; there seems too
much risk of breaking third-party code because of the sizeof() issue.
But it'd be a good idea to have it in place before we find ourselves
having to do -fno-aggressive-loop-optimizations or some such even in
up-to-date branches.

(I'm actually even more worried about gcc bugs that make this type of
assumption than about intentional changes on their part.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#6)
Re: Back branches vs. gcc 4.8.0

Peter Geoghegan <pg@heroku.com> writes:

Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
single place where we currently use the one element array pattern?

Yup, exactly.

I count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
pretty ugly to have that everywhere, in my opinion.

Hm, I see 4 places in HEAD. But in any case, is

int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

really better than

int16 values[FLEXIBLE_ARRAY_MEMBER];
} int2vector;

? I don't think so. Relying on comments to tell about critical
semantics of a data structure isn't really nice if you can do it
in a way that is standards-blessed and (some) compilers understand.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Tom Lane (#8)
Re: Back branches vs. gcc 4.8.0

On Fri, Apr 5, 2013 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm, I see 4 places in HEAD. But in any case, is

My mistake. I had REL9_2_STABLE checked out.

int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

really better than

int16 values[FLEXIBLE_ARRAY_MEMBER];
} int2vector;

? I don't think so.

I can see your point. Now that I look at the comments beside
FLEXIBLE_ARRAY_MEMBER, I see that indeed, as I suspected, the
Microsoft flexible array members are not completely compatible with
C99 style flexible arrays, so this may be the least-worst option.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
Re: Back branches vs. gcc 4.8.0

On Fri, 2013-04-05 at 18:14 -0400, Tom Lane wrote:

Since gcc 4.8 is going to be on a lot of people's machines pretty
soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

Using a fixed-size struct member as a flexible one has always been a
violation of the C standard, although a widely tolerated one. Doing
that in the middle of a struct, however, is totally wrong, and the
compiler is perfectly in its right to make a mess of it. Even flexible
array members are not allowed in the middle of a struct.

So I think this is not a compiler bug or an arms race. We just need to
fix the code. So I'm in favor of backporting.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Geoghegan (#6)
Re: Back branches vs. gcc 4.8.0

On Fri, 2013-04-05 at 23:44 +0100, Peter Geoghegan wrote:

On Fri, Apr 5, 2013 at 11:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The respective macro magic is already in place, its just not used in all
places. The problem is more that we can't easily use it in all places
because e.g. in the one case mentioned here the array isn't in the last
place *in the back branches*.

Are you proposing that we use the FLEXIBLE_ARRAY_MEMBER macro in every
single place where we currently use the one element array pattern? I
count one place where we currently use FLEXIBLE_ARRAY_MEMBER. It'd be
pretty ugly to have that everywhere, in my opinion.

Background: The reason I put in that one use of FLEXIBLE_ARRAY_MEMBER is
that at one point clang threw a warning about the old coding. There
were no warnings about the other sites that use array[1].

The reason that the whole code wasn't converted right away was (besides
a lot of legwork with sizeof and offsetoff) that flexible array members
aren't allowed in the middle of structs. Which eventually led to the
mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.

If someone wants to go through and change the rest of the code to use
FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has
nothing to do with the current bug or future-proofing anything. All
compilers tolerate the current coding.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: Back branches vs. gcc 4.8.0

Peter Eisentraut <peter_e@gmx.net> writes:

The reason that the whole code wasn't converted right away was (besides
a lot of legwork with sizeof and offsetoff) that flexible array members
aren't allowed in the middle of structs. Which eventually led to the
mentioned commit 8137f2c32322c624e0431fac1621e8e9315202f9.

If someone wants to go through and change the rest of the code to use
FLEXIBLE_ARRAY_MEMBER, I won't mind. But I think it actually has
nothing to do with the current bug or future-proofing anything. All
compilers tolerate the current coding.

The reason I'm thinking it's a good idea is that it would expose any
remaining places where we have nominally var-length arrays embedded in
larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
quite worried that there might be some more declarations like that
which we've not identified yet, but that by chance aren't causing
obvious failures today. (This is also why I'm not that excited about
trying to fix things "properly" in the back branches compared to
selecting -fno-aggressive-loop-optimizations: I'm afraid there might
be more to it than just the one commit.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#10)
Re: Back branches vs. gcc 4.8.0

On Fri, Apr 5, 2013 at 9:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

So I think this is not a compiler bug or an arms race. We just need to
fix the code. So I'm in favor of backporting.

I can certainly see this argument. I understand Tom's point about an
arms race, but back-porting this doesn't feel terribly risky to me.
The thing is, if the arms race is escalating faster than we're
comfortable with, we can always opt opt at a later time; it's not as
if back-porting this fix now commits us irrevocably.

Then, too, I tend to think this is more our fault than gcc's - for a change.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
Re: Back branches vs. gcc 4.8.0

On 4/5/13 6:14 PM, Tom Lane wrote:

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

At the moment, I wouldn't do anything. At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work. Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions. These things come and go, but releases live a long time. How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run? I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Peter Eisentraut (#14)
Re: Back branches vs. gcc 4.8.0

On 09/04/13 08:41, Peter Eisentraut wrote:

On 4/5/13 6:14 PM, Tom Lane wrote:

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

At the moment, I wouldn't do anything. At least until we have converted
master to use flexible array members completely, and we have learned the
extent of the issue.

The problem manifests itself easily through the regression tests, so
there is no guessing about whether a particular combination of versions
will work. Someone who uses a cutting edge compiler with a somewhat old
PG release is doing something special anyway, so they should have the
required skills to put in the workaround.

I would rather avoid patching in specific compiler options for specific
versions. These things come and go, but releases live a long time. How
do we know -fno-aggressive-loop-optimizations is the only option we need
in the long run? I'd rather see a direct crash or a known code fix.

As an aside, we already have -fno-strict-aliasing and -fwrapv. Add more
and it will begin to read like

-fmy-code -fis-broken -fhelp-me

;-)

-fno-break-my-code

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#12)
Re: Back branches vs. gcc 4.8.0

On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:

The reason I'm thinking it's a good idea is that it would expose any
remaining places where we have nominally var-length arrays embedded in
larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
quite worried that there might be some more declarations like that
which we've not identified yet, but that by chance aren't causing
obvious failures today.

Here is a rough patch that replaces almost all occurrences of
something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left
and right (because of sizeof issues, probably), but at least so far the
compiler hasn't complained about any flexible-array members not at the
end of the struct, which is what it did last time. So the answer to
your concern so far is negative.

Completing this patch will be quite a bit more debugging work. Some
kind of electric fence for palloc would be helpful.

Attachments:

flexible-array-member.patchtext/x-patch; charset=UTF-8; name=flexible-array-member.patchDownload+77-82
#17Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Back branches vs. gcc 4.8.0

On Fri, Apr 5, 2013 at 11:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about

I'm confused. I would have described (1) as entering an arms race.
Each new optimization related to arrays and structs would need a new
flag. Whereas (2) makes the code pretty common traditional code that
gcc is going to need to tolerate for the foreseeable future

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#16)
Re: Back branches vs. gcc 4.8.0

On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote:

On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:

The reason I'm thinking it's a good idea is that it would expose any
remaining places where we have nominally var-length arrays embedded in
larger structs. Now that I've seen the failures with gcc 4.8.0, I'm
quite worried that there might be some more declarations like that
which we've not identified yet, but that by chance aren't causing
obvious failures today.

Here is a rough patch that replaces almost all occurrences of
something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left
and right (because of sizeof issues, probably), but at least so far the
compiler hasn't complained about any flexible-array members not at the
end of the struct, which is what it did last time. So the answer to
your concern so far is negative.

I think this point in the cycle would be a good one to apply something
like this.

Completing this patch will be quite a bit more debugging work. Some
kind of electric fence for palloc would be helpful.

Noah's recently added valgrind mode should provide this.

Do you have an updated version of this patch already? I'd be willing to
make a pass over it to check whether I find any missed updates...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers