[commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

commits-3
 
Revision: 3753
Author:   lewis
Date:     2016-12-20 17:03:04 -0800 (Tue, 20 Dec 2016)
Log Message:
-----------
Do not use -O3 optimization, -O2 is safer and works well.

gcc 4.9.2 gives crashes and heisenbugs with OSPP compiled with -O3.

Symptoms:
(1 to: 10) collect: [ :e | (CommandShell new pipeline: 'ps > /dev/null | cat') output]
  ==> segfaults in some of the spawned child processes

This appears to be an actual compiler bug, although it goes away when print
statements are added, so I cannot say for sure (and it is difficult to attach
gdb to the newly spawned child process before it crashes). The bad behavior
happens only with -O3 and there are no real performance benefits compared
to -O2 (bytecodes faster, sends slower).

Modified Paths:
--------------
    trunk/platforms/unix/cmake/Makefile.example

Modified: trunk/platforms/unix/cmake/Makefile.example
===================================================================
--- trunk/platforms/unix/cmake/Makefile.example 2016-12-11 16:49:35 UTC (rev 3752)
+++ trunk/platforms/unix/cmake/Makefile.example 2016-12-21 01:03:04 UTC (rev 3753)
@@ -9,7 +9,7 @@
 # Assume platforms is ../platforms and src is ../src
 
 # CFLAGS setting to pass to cmake configure. If undefined, use compiler defaults.
-CFLAGS_PARAM="--CFLAGS='-O3 -D_FILE_OFFSET_BITS=64'"
+CFLAGS_PARAM="--CFLAGS='-O2 -D_FILE_OFFSET_BITS=64'"
 #CFLAGS_PARAM="--CFLAGS='-O0 -g'"
 
 squeakvm: build/squeakvm build64/squeakvm64

Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Ben Coman
 
On Wed, Dec 21, 2016 at 9:03 AM,  <[hidden email]> wrote:

>
> Revision: 3753
> Author:   lewis
> Date:     2016-12-20 17:03:04 -0800 (Tue, 20 Dec 2016)
> Log Message:
> -----------
> Do not use -O3 optimization, -O2 is safer and works well.
>
> gcc 4.9.2 gives crashes and heisenbugs with OSPP compiled with -O3.
>
> Symptoms:
> (1 to: 10) collect: [ :e | (CommandShell new pipeline: 'ps > /dev/null | cat') output]
>   ==> segfaults in some of the spawned child processes
>
> This appears to be an actual compiler bug, although it goes away when print
> statements are added, so I cannot say for sure (and it is difficult to attach
> gdb to the newly spawned child process before it crashes). The bad behavior
> happens only with -O3 and there are no real performance benefits compared
> to -O2 (bytecodes faster, sends slower).

I'd be interested to know the reason sends get slower (if known).
cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

David T. Lewis
 
On Wed, Dec 21, 2016 at 10:40:32AM +0800, Ben Coman wrote:

>  
> On Wed, Dec 21, 2016 at 9:03 AM,  <[hidden email]> wrote:
> >
> > Revision: 3753
> > Author:   lewis
> > Date:     2016-12-20 17:03:04 -0800 (Tue, 20 Dec 2016)
> > Log Message:
> > -----------
> > Do not use -O3 optimization, -O2 is safer and works well.
> >
> > gcc 4.9.2 gives crashes and heisenbugs with OSPP compiled with -O3.
> >
> > Symptoms:
> > (1 to: 10) collect: [ :e | (CommandShell new pipeline: 'ps > /dev/null | cat') output]
> >   ==> segfaults in some of the spawned child processes
> >
> > This appears to be an actual compiler bug, although it goes away when print
> > statements are added, so I cannot say for sure (and it is difficult to attach
> > gdb to the newly spawned child process before it crashes). The bad behavior
> > happens only with -O3 and there are no real performance benefits compared
> > to -O2 (bytecodes faster, sends slower).
>
> I'd be interested to know the reason sends get slower (if known).
> cheers -ben

No clue, I was just sanity checking to make sure that -O2 was not horribly
worse. It was not. But I suspect all of this is likely to vary depending on
gcc compiler version and phase of the moon.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Jan Vrany
 
On Tue, 2016-12-20 at 21:54 -0500, David T. Lewis wrote:

>  
> On Wed, Dec 21, 2016 at 10:40:32AM +0800, Ben Coman wrote:
> >  
> > On Wed, Dec 21, 2016 at 9:03 AM,  <[hidden email]> wrote:
> > >
> > > Revision: 3753
> > > Author:   lewis
> > > Date:     2016-12-20 17:03:04 -0800 (Tue, 20 Dec 2016)
> > > Log Message:
> > > -----------
> > > Do not use -O3 optimization, -O2 is safer and works well.
> > >
> > > gcc 4.9.2 gives crashes and heisenbugs with OSPP compiled with
> > > -O3.
> > >
> > > Symptoms:
> > > (1 to: 10) collect: [ :e | (CommandShell new pipeline: 'ps >
> > > /dev/null | cat') output]
> > >   ==> segfaults in some of the spawned child processes
> > >
> > > This appears to be an actual compiler bug, although it goes away
> > > when print
> > > statements are added, so I cannot say for sure (and it is
> > > difficult to attach
> > > gdb to the newly spawned child process before it crashes). The
> > > bad behavior
> > > happens only with -O3 and there are no real performance benefits
> > > compared
> > > to -O2 (bytecodes faster, sends slower).
> >
> > I'd be interested to know the reason sends get slower (if known).
> > cheers -ben
>
> No clue, I was just sanity checking to make sure that -O2 was not
> horribly
> worse. It was not. But I suspect all of this is likely to vary
> depending on
> gcc compiler version and phase of the moon.

This post:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.htm
l

explains the dependency on moon phases (and compiler versions) :-)

Jan

>
> Dave
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

David T. Lewis
 
On Wed, Dec 21, 2016 at 10:11:39AM +0000, Jan Vrany wrote:

>  
> On Tue, 2016-12-20 at 21:54 -0500, David T. Lewis wrote:
> > ??
> > On Wed, Dec 21, 2016 at 10:40:32AM +0800, Ben Coman wrote:
> > > ??
> > > I'd be interested to know the reason sends get slower (if known).
> > > cheers -ben
> >
> > No clue, I was just sanity checking to make sure that -O2 was not
> > horribly
> > worse. It was not. But I suspect all of this is likely to vary
> > depending on
> > gcc compiler version and phase of the moon.
>
> This post:
>
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
>
> explains the dependency on moon phases (and compiler versions) :-)
>

Thanks. Indeed there is probably some undefined C behavior in there, although
I was not able to spot it. If anyone is interested in lending their eyes to
the problem, I was able to localize the crash to intermittent segfaults that
occurred in OSProcessPlugin>>fixPointersInArrayOfStrings: which is generated
in C (for a V3 image, not Spur) as:

/*      Use the address offsets in offsetArray to fix up the pointers in cStringArray.
        The result is a C array of pointers to char, used for argv and env vectors. */

static sqInt fixPointersInArrayOfStringswithOffsetscount(char *flattenedArrayOfStrings, sqInt *offsetArray, sqInt count) {
    sqInt idx;
    char **ptr;

        ptr = ((char **) flattenedArrayOfStrings);
        idx = 0;
        while (idx < count) {
                ptr[idx] = (flattenedArrayOfStrings + (((offsetArray[idx]) >> 1)));
                idx += 1;
        }
        return null;
}


Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Ben Coman
 
On Wed, Dec 21, 2016 at 10:35 PM, David T. Lewis <[hidden email]> wrote:

>
> On Wed, Dec 21, 2016 at 10:11:39AM +0000, Jan Vrany wrote:
>>
>> On Tue, 2016-12-20 at 21:54 -0500, David T. Lewis wrote:
>> > ??
>> > On Wed, Dec 21, 2016 at 10:40:32AM +0800, Ben Coman wrote:
>> > > ??
>> > > I'd be interested to know the reason sends get slower (if known).
>> > > cheers -ben
>> >
>> > No clue, I was just sanity checking to make sure that -O2 was not
>> > horribly
>> > worse. It was not. But I suspect all of this is likely to vary
>> > depending on
>> > gcc compiler version and phase of the moon.
>>
>> This post:
>>
>> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
>>
>> explains the dependency on moon phases (and compiler versions) :-)
>>
>
> Thanks. Indeed there is probably some undefined C behavior in there, although
> I was not able to spot it. If anyone is interested in lending their eyes to
> the problem, I was able to localize the crash to intermittent segfaults that
> occurred in OSProcessPlugin>>fixPointersInArrayOfStrings: which is generated
> in C (for a V3 image, not Spur) as:
>
> /*      Use the address offsets in offsetArray to fix up the pointers in cStringArray.
>         The result is a C array of pointers to char, used for argv and env vectors. */
>
> static sqInt fixPointersInArrayOfStringswithOffsetscount(char *flattenedArrayOfStrings, sqInt *offsetArray, sqInt count) {
>     sqInt idx;
>     char **ptr;
>
>         ptr = ((char **) flattenedArrayOfStrings);
>         idx = 0;
>         while (idx < count) {
>                 ptr[idx] = (flattenedArrayOfStrings + (((offsetArray[idx]) >> 1)));
>                 idx += 1;
>         }
>         return null;
> }
>
>

I've taken a guess at its usage and turned it into an executable test case.
Could you confirm this...

#include <stdio.h>
typedef int sqInt;
int null = 0;
static sqInt fixPointersInArrayOfStringswithOffsetscount(char
*flattenedArrayOfStrings, sqInt *offsetArray, sqInt count) {
    sqInt idx;
    char **ptr;
        ptr = ((char **) flattenedArrayOfStrings);
        idx = 0;
        while (idx < count) {
                ptr[idx] = (flattenedArrayOfStrings +
(((offsetArray[idx]) >> 1)));
                idx += 1;
        }
        return null;
}
int main()
{
        char *flattenedArrayOfStrings = "abcd\0efgh\0ijkl\0";
        sqInt offsetArray[] = {0, 5, 10};
        printf("%s\n", flattenedArrayOfStrings);
        printf("%d %d %d\n", offsetArray[0], offsetArray[1], offsetArray[2]);
        fixPointersInArrayOfStringswithOffsetscount(
flattenedArrayOfStrings, offsetArray, 2 );
}

$ cc test.c ; ./a.out
abcd
0 5 10
Segmentation fault

ptr is being defined as a pointer to "one" string, but its being
accessed as a consecutive list of strings.
Even the first assignment to ptr seems wrong.  flattenedArrayOfStrings
is a string. The first dereference is a char, and then when it assigns
to ptr[idx], it tries to dereferenced the char and boom!


Further, when ndx is 1,   the assignment    "ptr[idx] = ...."
is going to be taking

Just shooting the breeze...
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Ben Coman
In reply to this post by David T. Lewis
 
Sorry, ignore that last one.  It fired off incomplete.

On Wed, Dec 21, 2016 at 10:35 PM, David T. Lewis <[hidden email]> wrote:

>
> On Wed, Dec 21, 2016 at 10:11:39AM +0000, Jan Vrany wrote:
>>
>> On Tue, 2016-12-20 at 21:54 -0500, David T. Lewis wrote:
>> > ??
>> > On Wed, Dec 21, 2016 at 10:40:32AM +0800, Ben Coman wrote:
>> > > ??
>> > > I'd be interested to know the reason sends get slower (if known).
>> > > cheers -ben
>> >
>> > No clue, I was just sanity checking to make sure that -O2 was not
>> > horribly
>> > worse. It was not. But I suspect all of this is likely to vary
>> > depending on
>> > gcc compiler version and phase of the moon.
>>
>> This post:
>>
>> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
>>
>> explains the dependency on moon phases (and compiler versions) :-)
>>
>
> Thanks. Indeed there is probably some undefined C behavior in there, although
> I was not able to spot it. If anyone is interested in lending their eyes to
> the problem, I was able to localize the crash to intermittent segfaults that
> occurred in OSProcessPlugin>>fixPointersInArrayOfStrings: which is generated
> in C (for a V3 image, not Spur) as:
>
> /*      Use the address offsets in offsetArray to fix up the pointers in cStringArray.
>         The result is a C array of pointers to char, used for argv and env vectors. */
>
> static sqInt fixPointersInArrayOfStringswithOffsetscount(char *flattenedArrayOfStrings, sqInt *offsetArray, sqInt count) {
>     sqInt idx;
>     char **ptr;
>
>         ptr = ((char **) flattenedArrayOfStrings);
>         idx = 0;
>         while (idx < count) {
>                 ptr[idx] = (flattenedArrayOfStrings + (((offsetArray[idx]) >> 1)));
>                 idx += 1;
>         }
>         return null;
> }
>
>


I've taken a guess at its usage and turned it into an executable test case.
Could you confirm this...

#include <stdio.h>
typedef int sqInt;
int null = 0;
static sqInt fixPointersInArrayOfStringswithOffsetscount(char *flattenedArrayOfStrings, sqInt *offsetArray, sqInt count) {
    sqInt idx;
    char **ptr;
        ptr = ((char **) flattenedArrayOfStrings);
        idx = 0;
        while (idx < count) {
                ptr[idx] = (flattenedArrayOfStrings + (((offsetArray[idx]) >> 1)));
                idx += 1;
        }
        return null;
}
int main()
{
        char *flattenedArrayOfStrings = "abcd\0efgh\0ijkl\0";
        sqInt offsetArray[] = {0, 5, 10};
        printf("%s\n", flattenedArrayOfStrings);
        printf("%d %d %d\n", offsetArray[0], offsetArray[1], offsetArray[2]);
        fixPointersInArrayOfStringswithOffsetscount( flattenedArrayOfStrings, offsetArray, 2 );
        printf("%s\n", flattenedArrayOfStrings);
}

$ cc test.c ; ./a.out
abcd
0 5 10
Segmentation fault

ptr is being defined as a pointer to "one" string, but its being
accessed as a consecutive list of strings.
Even the first assignment to ptr seems wrong.  flattenedArrayOfStrings
is a string. The first dereference is a char, and then when it assigns
to ptr[idx], it tries to dereferenced the char and boom!

If the first ptr assignment is changed as follows, it doesn't crash...
       ptr = &flattenedArrayOfStrings;
 
But I'm not sure it does what its supposed to ??
$ cc test.c ; ./a.out 
abcd
0 5 10
abcd
cd
abcd


Now if I do this...
static sqInt fixPointersInArrayOfStringswithOffsetscount(char *flattenedArrayOfStrings, sqInt offsetArray[], sqInt count) {
    sqInt idx;
    char *ptr[count+1];
        idx = 0;
        while (idx < count) {
                ptr[idx] = flattenedArrayOfStrings + offsetArray[idx];
                idx += 1;
        }
    idx = 0;
    while (idx < count) {
            printf("%s\n", ptr[idx]);
            idx += 1;
    }
    return null;
}

I can call it with count=3 and get this...
$ cc test.c ; ./a.out 
abcd
0 5 10
abcd
efgh
ijkl
abcd

but the original flattenedArrayOfStrings is unmodified, so I'm not sure if that is how its meant to behave?

cheers -ben 

Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Holger Freyther
 

> On 21 Dec 2016, at 18:21, Ben Coman <[hidden email]> wrote:
>
> Sorry, ignore that last one.  It fired off incomplete.

-fsanitize=undefined for the lazy.. but the code might end up breaking the alias rules..

holger
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Andres Valloud-4
In reply to this post by commits-3
 
Compiler manuals usually state -O3 and higher do not preserve language
semantics.  It's going to be hard to prove "bug" in the non-default
presence of a switch known to potentially produce undefined behavior.

I'd stick to defined behavior and -O2.  Moreover, if as noted -O3 seems
to result in no benefit... -O2 is a pure win.

On 12/20/16 17:03 , [hidden email] wrote:
> This appears to be an actual compiler bug, although it goes away when print
> statements are added, so I cannot say for sure (and it is difficult to attach
> gdb to the newly spawned child process before it crashes). The bad behavior
> happens only with -O3 and there are no real performance benefits compared
> to -O2 (bytecodes faster, sends slower).
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Holger Freyther
 

> On 23 Dec 2016, at 22:30, Andres Valloud <[hidden email]> wrote:
>
> Compiler manuals usually state -O3 and higher do not preserve language semantics.  It's going to be hard to prove "bug" in the non-default presence of a switch known to potentially produce undefined behavior.

Where did you get that from? I tried reading it up but don't see anything.


Clang:

-O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).

=> so longer compilation time, bigger binary



gcc:

-O3
Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre, -fpeel-loops and -fipa-cp-clone options.

=> None says that the language standard is broken



Intel ICC

-O3

Performs O2 optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.



So both Intel ICC an GCC start doing more auto vectorization with -O3. Nothing of that is breaking the language semantic. So in most cases if -O3 breaks things.. the code has undefined behavior...


holger


Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Andres Valloud-4
 
Huh, I couldn't find GCC examples quickly.  However, here's something I
actually ran into.  The manual for the IBM XL C 10.x compiler for AIX on
POWER explicitly states -O3 causes comparatively common functions to
stop setting errno.  Critically, however, this statement is *not* in
every reference to the -O3 switch.  Rather, it was in a manual set that
was much more detailed than these kinds of links:

http://www.ibm.com/support/knowledgecenter/en/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/proguide/levelthree.html

http://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/proguide/advancedoptimization.html

Notice how the above pages do not say anything about errno?  IIRC the
man page didn't say anything about errno with -O3 either.  However, this
one other page does:

http://www.ibm.com/support/knowledgecenter/en/SSGH2K_11.1.0/com.ibm.xlc111.aix.doc/compiler_ref/opt_optimize.html

And in there, there are other statements such as "with -O3 some
instructions are placed in code paths where they always execute, while
with -O2 that's not the case".  Looks like asking for undefined behavior
in the general case to me.

This particular example was the breaking point for me, and from then on
I started assuming -O3 is not behavior preserving, even if the manual
doesn't immediately say there are problems.  Also, manuals are not
necessarily complete.  That's not to say that -O3 isn't useful without
the necessary verification steps.  And while I couldn't quickly find
similar examples for other compilers, I do remember reading them.  From
a VM engineering perspective however, and given all the gray area
semi-undefined behavior things one is basically forced to do, my opinion
is -O2 is the way to go.

Then again, even with -O2 one always has to read the manual and be aware
of what's going on.  For instance, in the latest GCC with -O2, the
optimization -fdelete-null-pointer-checks is enabled.  However, the
manual also says:

========================
Assume that programs cannot safely dereference null pointers, and that
no code or data element resides at address zero. This option enables
simple constant folding optimizations at all optimization levels. In
addition, other optimization passes in GCC use this flag to control
global dataflow analyses that eliminate useless checks for null
pointers; these assume that a memory access to address zero always
results in a trap, so that if a pointer is checked after it has already
been dereferenced, it cannot be null.

Note however that in some environments this assumption is not true.
========================

And yes, there are realistic places where NULL can be dereferenced.
Again on the AIX versions I worked on, the memory at 0x0 was mapped.
Due to some bugs at the time, HPS overwrote that memory.  And yet, the
system continued to appear to work.  Thus, some null pointer segfaults
for some platforms did not happen on AIX, even though the code and the
wrong behavior were exactly the same.

Andres.

On 12/23/16 13:52 , Holger Freyther wrote:

>
>> On 23 Dec 2016, at 22:30, Andres Valloud <[hidden email]> wrote:
>>
>> Compiler manuals usually state -O3 and higher do not preserve language semantics.  It's going to be hard to prove "bug" in the non-default presence of a switch known to potentially produce undefined behavior.
>
> Where did you get that from? I tried reading it up but don't see anything.
>
>
> Clang:
>
> -O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
>
> => so longer compilation time, bigger binary
>
>
>
> gcc:
>
> -O3
> Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre, -fpeel-loops and -fipa-cp-clone options.
>
> => None says that the language standard is broken
>
>
>
> Intel ICC
>
> -O3
>
> Performs O2 optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
>
>
>
> So both Intel ICC an GCC start doing more auto vectorization with -O3. Nothing of that is breaking the language semantic. So in most cases if -O3 breaks things.. the code has undefined behavior...
>
>
> holger
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Andres Valloud-4
In reply to this post by Holger Freyther
 
I'm going to spend some time on Google to see if I can find what I
remember reading.  In the mean time, consider this statement from the
GCC manual:

"Not all optimizations are controlled directly by a flag. Only
optimizations that have a flag are listed in this section."

Ok so, just like the errno case with IBM's XL C compiler, does that mean
e.g. -O3 enables optimizations that are not behavior preserving that
also do not have a flag, thus are unlisted, and thus -O3 doesn't have to
say it's not behavior preserving?

On 12/23/16 13:52 , Holger Freyther wrote:

>
>> On 23 Dec 2016, at 22:30, Andres Valloud <[hidden email]> wrote:
>>
>> Compiler manuals usually state -O3 and higher do not preserve language semantics.  It's going to be hard to prove "bug" in the non-default presence of a switch known to potentially produce undefined behavior.
>
> Where did you get that from? I tried reading it up but don't see anything.
>
>
> Clang:
>
> -O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
>
> => so longer compilation time, bigger binary
>
>
>
> gcc:
>
> -O3
> Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre, -fpeel-loops and -fipa-cp-clone options.
>
> => None says that the language standard is broken
>
>
>
> Intel ICC
>
> -O3
>
> Performs O2 optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
>
>
>
> So both Intel ICC an GCC start doing more auto vectorization with -O3. Nothing of that is breaking the language semantic. So in most cases if -O3 breaks things.. the code has undefined behavior...
>
>
> holger
>
>
>