Hi Igor, Hi All, On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
Indeed, quite right. I happened to add a flag to turn off the heartbeat so I could debug the crash Matthew was seeing in starting up Squeak4.2-10856-beta.image (since single-stepping through machine code always gets interrupted by the heartbeat, it being an interval timer) and lo and behold the bug went away. This is very worrying because it appears to imply that there's a serious bug in the linux kernel/gcc since delivering a software interrupt shouldn't corrupt registers, but it clearly does. I'll try and pass it by someone who's an expert in this area.
Anyway, now find a new linux VM in VM.r2346/ that seems fine with the interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. Running this VM on CentOS 5.3 under Parallels I get
2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 unexpected passes for the full test suite in Squeak4.2-10856-beta.image. So have at it. best Eliot
|
On 11 January 2011 04:42, Eliot Miranda <[hidden email]> wrote: > Hi Igor, Hi All, > > On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote: >> >> On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote: >> > Hi Martin, Hi All, >> > so find new VMs in VM.r2341/. The linux crashes (certainly the one >> > you >> > suffered from Martin) seem to be caused by an optimization bug (but they >> > could be caused by bad code generation, creating something that assumes >> > ordering constraints which C doesn't guarantee). I suspect the former >> > because I don't see the crash when running exactly the same VM and image >> > from a different directory; provoking the crash requires a particular >> > path >> > (go figure; I haven't pinned this down yet). >> > So my "fix" is preventing a complex function being inlined into the main >> > interpreter loop, removing the sources of some warnings, and lowering >> > the >> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my >> > build >> > environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this >> > "fix". I've violated the Deutsch criterion by not diagnosing the cause >> > of >> > the bug so I can't stand behind this fix; it's a hack that appears to >> > work >> > and may have merely pushed the real bug further underground. Alas I >> > don't >> > have time to do a better job. Hopefully it'll get those of you on linux >> > going again. >> >> Eliot, if you remember, i also had crash issues on linux, and pinned down >> it to >> removing optimization from <something>heartbeat.c while keeping >> gcc3x-cointerp.c to use >> same optimization flags as for rest of files. > > Indeed, quite right. I happened to add a flag to turn off the heartbeat so > I could debug the crash Matthew was seeing in starting up > Squeak4.2-10856-beta.image (since single-stepping through machine code > always gets interrupted by the heartbeat, it being an interval timer) and lo > and behold the bug went away. This is very worrying because it appears to > imply that there's a serious bug in the linux kernel/gcc since delivering a > software interrupt shouldn't corrupt registers, but it clearly does. I'll > try and pass it by someone who's an expert in this area. > Anyway, now find a new linux VM in VM.r2346/ that seems fine with the > interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. > Running this VM on CentOS 5.3 under Parallels I get > 2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 > unexpected passes > for the full test suite in Squeak4.2-10856-beta.image. > So have at it. Great! Nice to see that we can deal with elusive bugs :) > best > Eliot >> >> I will start coding cmake config for Cog during next week and will be >> able to check my previous >> observations again. >> >> > best >> > Eliot >> > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote: >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Eliot Miranda-2
On Mon, Jan 10, 2011 at 11:31 PM, Andres Valloud <[hidden email]> wrote:
Good point. The signal handler essentially calls gettimeofday (not on the approved list but time is) and sets a couple of variables (current 64-bit microsecond time, stackLimit). But it does not preserve errno. I don't want to avoid calling gettimeofday and can see no harm in it; it's equivalent internally to time providing it doesn't use its TIMEZONE arg (which it doesn't). But the signal handler /doesn't/ preserve errno and doing so is a very good idea and easy to do. Thanks.
|
Free forum by Nabble | Edit this page |