Re: [squeak-dev] Re: [Pharo-project] new Cog VMs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
 
Hi Igor,  Hi All,

On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All,
>     so find new VMs in VM.r2341/.  The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee).  I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
> "fix".  I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground.  Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
 same optimization flags as for rest of files.

Indeed, quite right.  I happened to add a flag to turn off the heartbeat so I could debug the crash Matthew was seeing in starting up Squeak4.2-10856-beta.image (since single-stepping through machine code always gets interrupted by the heartbeat, it being an interval timer) and lo and behold the bug went away.  This is very worrying because it appears to imply that there's a serious bug in the linux kernel/gcc since delivering a software interrupt shouldn't corrupt registers, but it clearly does.  I'll try and pass it by someone who's an expert in this area.

Anyway, now find a new linux VM in VM.r2346/ that seems fine with the interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2.  Running this VM on CentOS 5.3 under Parallels I get
    2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 unexpected passes
for the full test suite in Squeak4.2-10856-beta.image.

So have at it.

best
Eliot


I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.


Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: [Pharo-project] new Cog VMs

Igor Stasenko

On 11 January 2011 04:42, Eliot Miranda <[hidden email]> wrote:

> Hi Igor,  Hi All,
>
> On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
>> > Hi Martin, Hi All,
>> >     so find new VMs in VM.r2341/.  The linux crashes (certainly the one
>> > you
>> > suffered from Martin) seem to be caused by an optimization bug (but they
>> > could be caused by bad code generation, creating something that assumes
>> > ordering constraints which C doesn't guarantee).  I suspect the former
>> > because I don't see the crash when running exactly the same VM and image
>> > from a different directory; provoking the crash requires a particular
>> > path
>> > (go figure; I haven't pinned this down yet).
>> > So my "fix" is preventing a complex function being inlined into the main
>> > interpreter loop, removing the sources of some warnings, and lowering
>> > the
>> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my
>> > build
>> > environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
>> > "fix".  I've violated the Deutsch criterion by not diagnosing the cause
>> > of
>> > the bug so I can't stand behind this fix; it's a hack that appears to
>> > work
>> > and may have merely pushed the real bug further underground.  Alas I
>> > don't
>> > have time to do a better job. Hopefully it'll get those of you on linux
>> > going again.
>>
>> Eliot, if you remember, i also had crash issues on linux, and pinned down
>> it to
>> removing optimization from <something>heartbeat.c while keeping
>> gcc3x-cointerp.c to use
>>  same optimization flags as for rest of files.
>
> Indeed, quite right.  I happened to add a flag to turn off the heartbeat so
> I could debug the crash Matthew was seeing in starting up
> Squeak4.2-10856-beta.image (since single-stepping through machine code
> always gets interrupted by the heartbeat, it being an interval timer) and lo
> and behold the bug went away.  This is very worrying because it appears to
> imply that there's a serious bug in the linux kernel/gcc since delivering a
> software interrupt shouldn't corrupt registers, but it clearly does.  I'll
> try and pass it by someone who's an expert in this area.
> Anyway, now find a new linux VM in VM.r2346/ that seems fine with the
> interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2.
>  Running this VM on CentOS 5.3 under Parallels I get
>     2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0
> unexpected passes
> for the full test suite in Squeak4.2-10856-beta.image.
> So have at it.

Great! Nice to see that we can deal with elusive bugs :)

> best
> Eliot
>>
>> I will start coding cmake config for Cog during next week and will be
>> able to check my previous
>> observations again.
>>
>> > best
>> > Eliot
>> > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] [squeak-dev] Re: new Cog VMs

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
 


On Mon, Jan 10, 2011 at 11:31 PM, Andres Valloud <[hidden email]> wrote:
Indeed, quite right.  I happened to add a flag to turn off the heartbeat
so I could debug the crash Matthew was seeing in starting up
Squeak4.2-10856-beta.image (since single-stepping through machine code
always gets interrupted by the heartbeat, it being an interval timer)
and lo and behold the bug went away.  This is very worrying because it
appears to imply that there's a serious bug in the linux kernel/gcc
since delivering a software interrupt shouldn't corrupt registers, but
it clearly does.  I'll try and pass it by someone who's an expert in
this area.

Or the signal handler functions do not comply with the relevant specifications, e.g.: signal handlers that do not preserve the value of errno, signal handlers that use a function not in the list of approved safe functions you can call from a signal handler as per the Single Unix Specification, etc...

Good point.  The signal handler essentially calls gettimeofday (not on the approved list but time is) and sets a couple of variables (current 64-bit microsecond time, stackLimit).  But it does not preserve errno.  I don't want to avoid calling gettimeofday and can see no harm in it; it's equivalent internally to time providing it doesn't use its TIMEZONE arg (which it doesn't).  But the signal handler /doesn't/ preserve errno and doing so is a very good idea and easy to do.  Thanks.


Andres.