GCC optimization levels [was new Cog VMs available [please read]]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

GCC optimization levels [was new Cog VMs available [please read]]

Eliot Miranda-2
 
Hi All,

    responding to Andrew here because this is generally of interest to the vm-list.

On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <[hidden email]> wrote:
Hmmm.  Thanks for the advice -- we now build with -O3, and all's well.
I've run the VM at full load (mostly compiling) for 30 hours without a
hiccup.  Interesting that -O2 is problematic, but -O3 isn't; I assumed
that higher optimisations would make things less stable, not more so.
And we get a 17% speed increase.

       My GCC is:
       $ gcc --version
       gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3).  If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after.  I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be.  Some time some of us (me included) could really do to put the effort into understanding what the issue is.  It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour.  I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x).



- Andrew

On 2011.09.25 23:12:50 -0700, Eliot Miranda <[hidden email]> wrote:
> On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard <[hidden email]> wrote:
>
> > Actually, it looks like I was wrong.  After rebuiding everything from
> > scratch, I've been unable to reproduce these crashes, except for the
> > one with unix-4.4.7.image.
> >
> > Sorry for the false alarm.  r2495 looks pretty good, at both -O0 and
> > -O1.  It still crashes at -O2, but that's not a huge concern.
> >
>
> Which gcc are you using?  Here at Cadence on a much older 32-bit machine
> using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :)
>
>
> >
> >
> > On 2011.09.24 08:07:47 +0200, Andrew Gaylard <[hidden email]> wrote:
> > > On 2011.09.23 13:26:06 -0700, Eliot Miranda <[hidden email]>
> > wrote:
> > > > Thank you, Andrew, you nailed it.  I've found the bug via your stack
> > trace
> > > > below.  Huge relief.  Thanks!  New VMs and explanation to the list
> > soon.
> > >
> > > Alas, we spoke too soon.  -2495 exhibits the same symptoms; traces and
> > > gdb transcripts are attached.
> > >
> > > - vm-*-2495.0.txt are from our basic.image, running the test-runner.
> > > - vm-*-2495.1.txt are from Squeak4.2-10966.image, running the
> > test-runner.
> > > - vm-*-2495.2.txt are from unix-4.4.7.image, having just started up the
> > VM.
> > >
> > > The first two of these appear to be the same problem I encountered
> > > with -2493.  The backtraces certainly look very similar.
> > >
> > > The third one is rather different. Looking at the stack trace, the
> > > 'rcvr' variable in ceSendsupertonumArgs is 17039140, which is
> > > de-referenced in line 10733, causing a SEGV; the handler duly confirms
> > > the faulting address as si_addr = 0x103ff24:
> > >
> > >       $ perl -e 'print 0x103ff24'
> > >       17039140



--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: GCC optimization levels [was new Cog VMs available [please read]]

Nicolas Cellier

2011/9/26 Eliot Miranda <[hidden email]>:

>
> Hi All,
>     responding to Andrew here because this is generally of interest to the vm-list.
>
> On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <[hidden email]> wrote:
>>
>> Hmmm.  Thanks for the advice -- we now build with -O3, and all's well.
>> I've run the VM at full load (mostly compiling) for 30 hours without a
>> hiccup.  Interesting that -O2 is problematic, but -O3 isn't; I assumed
>> that higher optimisations would make things less stable, not more so.
>> And we get a 17% speed increase.
>>
>>        My GCC is:
>>        $ gcc --version
>>        gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
>
> So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3).  If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after.  I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be.  Some time some of us (me included) could really do to put the effort into understanding what the issue is.  It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour.  I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x).
>

Again some nasty kind of undefined shifts on signed/unsigned ints?
Or a macro expansion leading to subtle subexpression ordering (++i/++i) ?
Or one of the many dark zones here:
http://www.vmunix.com/~gabor/c/draft.html#601

Being able to enumerate such a long list without any omission is
already something!
What a beautiful language !

Nicolas

>>
>> - Andrew
>>
>> On 2011.09.25 23:12:50 -0700, Eliot Miranda <[hidden email]> wrote:
>> > On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard <[hidden email]> wrote:
>> >
>> > > Actually, it looks like I was wrong.  After rebuiding everything from
>> > > scratch, I've been unable to reproduce these crashes, except for the
>> > > one with unix-4.4.7.image.
>> > >
>> > > Sorry for the false alarm.  r2495 looks pretty good, at both -O0 and
>> > > -O1.  It still crashes at -O2, but that's not a huge concern.
>> > >
>> >
>> > Which gcc are you using?  Here at Cadence on a much older 32-bit machine
>> > using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :)
>> >
>> >
>> > >
>> > >
>> > > On 2011.09.24 08:07:47 +0200, Andrew Gaylard <[hidden email]> wrote:
>> > > > On 2011.09.23 13:26:06 -0700, Eliot Miranda <[hidden email]>
>> > > wrote:
>> > > > > Thank you, Andrew, you nailed it.  I've found the bug via your stack
>> > > trace
>> > > > > below.  Huge relief.  Thanks!  New VMs and explanation to the list
>> > > soon.
>> > > >
>> > > > Alas, we spoke too soon.  -2495 exhibits the same symptoms; traces and
>> > > > gdb transcripts are attached.
>> > > >
>> > > > - vm-*-2495.0.txt are from our basic.image, running the test-runner.
>> > > > - vm-*-2495.1.txt are from Squeak4.2-10966.image, running the
>> > > test-runner.
>> > > > - vm-*-2495.2.txt are from unix-4.4.7.image, having just started up the
>> > > VM.
>> > > >
>> > > > The first two of these appear to be the same problem I encountered
>> > > > with -2493.  The backtraces certainly look very similar.
>> > > >
>> > > > The third one is rather different. Looking at the stack trace, the
>> > > > 'rcvr' variable in ceSendsupertonumArgs is 17039140, which is
>> > > > de-referenced in line 10733, causing a SEGV; the handler duly confirms
>> > > > the faulting address as si_addr = 0x103ff24:
>> > > >
>> > > >       $ perl -e 'print 0x103ff24'
>> > > >       17039140
>
>
>
> --
> best,
> Eliot
>
>
Reply | Threaded
Open this post in threaded view
|

Re: GCC optimization levels [was new Cog VMs available [please read]]

Stefan Marr

Hi:

On 27 Sep 2011, at 08:45, Nicolas Cellier wrote:

> Again some nasty kind of undefined shifts on signed/unsigned ints?
> Or a macro expansion leading to subtle subexpression ordering (++i/++i) ?
> Or one of the many dark zones here:
> http://www.vmunix.com/~gabor/c/draft.html#601
>
> Being able to enumerate such a long list without any omission is
> already something!
> What a beautiful language !

Kind of similar to the problems previous mentioned in this thread, we got a optimization bug in the RoarVM codebase too.

It is biting me with an infinite loop when I use a GCC >4.2 or Intel compiler and enable optimization >O1 on two specific files.

Still, I think the bug is somewhere completely else.

I tried to go through the files and disable optimization for particular functions, without useful result... It is jumping. Perhaps there are multiple places where we use unspecified code.

Are there any tools that could be useful to find such things?

I tried the Clang static analyzer, but without much success.
It also gives a lot of warnings about potentially uninitialized variables in the primitives. It does not like the primitiveFail/successFlag checks at all.

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525

Reply | Threaded
Open this post in threaded view
|

Re: GCC optimization levels [was new Cog VMs available [please read]]

Eliot Miranda-2
 


On Tue, Sep 27, 2011 at 12:05 AM, Stefan Marr <[hidden email]> wrote:

Hi:

On 27 Sep 2011, at 08:45, Nicolas Cellier wrote:

> Again some nasty kind of undefined shifts on signed/unsigned ints?
> Or a macro expansion leading to subtle subexpression ordering (++i/++i) ?
> Or one of the many dark zones here:
> http://www.vmunix.com/~gabor/c/draft.html#601
>
> Being able to enumerate such a long list without any omission is
> already something!
> What a beautiful language !

Kind of similar to the problems previous mentioned in this thread, we got a optimization bug in the RoarVM codebase too.

It is biting me with an infinite loop when I use a GCC >4.2 or Intel compiler and enable optimization >O1 on two specific files.

Still, I think the bug is somewhere completely else.

I tried to go through the files and disable optimization for particular functions, without useful result... It is jumping. Perhaps there are multiple places where we use unspecified code.

Are there any tools that could be useful to find such things?

The only way I know to do this quickly is to get a reproducible case that runs to failure from start-up without user intervention and run the two VMs side-by-side on that case.  If the bug shows itself when tracing is turned on then it can be relatively easy to find the point at which the two diverge and backtrack from there.

I tried the Clang static analyzer, but without much success.
It also gives a lot of warnings about potentially uninitialized variables in the primitives. It does not like the primitiveFail/successFlag checks at all.

:)
 

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525">+32 2 629 3525




--
best,
Eliot