I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:
coglinux/squeak Contents/Resources/pharo.image crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way: Contents/Linux/squeakvm Contents/Resources/pharo.image seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ? Thanks, Martin "Eliot Miranda"<[hidden email]> wrote: > there are new versions of both the SimpleStackBasedCogit and the > StackToRegisterMappingCogit Cog VMs in > VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/> > & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively. > These contain fixes for rounding bug causing underestimate of openPICSize > and resultant hard crashes, seen e.g. by trying to recover lost changes in a > Pharo 1.2 image installed on c:\pharo. If you're trying to reproduce Cog > crashes please upgrade to one of tthese two VMs. |
Hi Martin,
crashing immediately on startup could be to do with the UUIDPlugin and libuuid. Try removing coglinux/lib/squeak/3.9-7/UUIDPlugin and see if it makes a difference. If it does I believe the fix is to ensure that there's a 32-bit libuuid.so.1 installed. e.g.
[eliot@mcqfes bld]$ ldd coglinux/lib/squeak/3.9-7/UUIDPlugin linux-gate.so.1 => (0x00e2e000) libuuid.so.1 => /lib/libuuid.so.1 (0x0067e000) libc.so.6 => /lib/libc.so.6 (0x008d6000)
/lib/ld-linux.so.2 (0x0045a000) [eliot@mcqfes bld]$ file /lib/libuuid.so.1 /lib/libuuid.so.1: symbolic link to `libuuid.so.1.2' [eliot@mcqfes bld]$ file /lib/libuuid.so.1.2
/lib/libuuid.so.1.2: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped If your problem is nothing to do with libuuid then I need info like exact OS version, what directories you've installed things in and a gdb backtrace for the crash. HTH Eliot On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote: I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as: |
In reply to this post by mkobetic
Hi Martin, Hi All,
so find new VMs in VM.r2341/. The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee). I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet). So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this "fix". I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground. Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.
best Eliot On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote: I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as: |
Em 02-01-2011 21:15, Eliot Miranda escreveu:
Hi Martin, Hi All,Ok, Today I downloaded new version of cog (svn) & compiled. Crashes are fixed. unixbuild fails to install automatically (make install). Somewhere it execs squeak.sh which fails (no imaqe & no sources) & then the rest of installation process must be done by hand (make install-plugins; make install-doc; cp inisqueak squeak <cogsrc>/bin etc...). Regards, CdAB
|
In reply to this post by Eliot Miranda-2
On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All, > so find new VMs in VM.r2341/. The linux crashes (certainly the one you > suffered from Martin) seem to be caused by an optimization bug (but they > could be caused by bad code generation, creating something that assumes > ordering constraints which C doesn't guarantee). I suspect the former > because I don't see the crash when running exactly the same VM and image > from a different directory; provoking the crash requires a particular path > (go figure; I haven't pinned this down yet). > So my "fix" is preventing a complex function being inlined into the main > interpreter loop, removing the sources of some warnings, and lowering the > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build > environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this > "fix". I've violated the Deutsch criterion by not diagnosing the cause of > the bug so I can't stand behind this fix; it's a hack that appears to work > and may have merely pushed the real bug further underground. Alas I don't > have time to do a better job. Hopefully it'll get those of you on linux > going again. Eliot, if you remember, i also had crash issues on linux, and pinned down it to removing optimization from <something>heartbeat.c while keeping gcc3x-cointerp.c to use same optimization flags as for rest of files. I will start coding cmake config for Cog during next week and will be able to check my previous observations again. > best > Eliot > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote: -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by mkobetic
"Eliot Miranda"<[hidden email]> wrote:
> so find new VMs in > VM.r2341/<http://www.mirandabanda.org/files/Cog/VM/VM.r2341/>. Yup, much better on my end. Thanks! Martin |
In reply to this post by CdAB63
Hi Casimiro,
On Sun, Jan 2, 2011 at 4:23 PM, Casimiro de Almeida Barreto <[hidden email]> wrote:
That should be fixed now. I had applied only half of the suggested fix to installing shell files. But the current svn sources contain both halves.
cheers Eliot
|
In reply to this post by Igor Stasenko
On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
That would be really great. The crash Martin and others was seeing was a) in evaluating Process>priority: #>= ended up getting sent to the Processor and in the subsequent attempt to raise a notifier for the doesNotUnderstand: the VM crashed in MethodContext>>tempNames. I only saw this crash using the Pharo1.1 one click installed in /pub/mkobetic/st/pharo /and/ the VM installed in /pub/mkobetic/st/pharo/coglinux. This was compiled with gcc 4.1.2, -O2.
|
In reply to this post by Eliot Miranda-2
Hi, Eliot and all,
With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla 4.2-10779 image on Windows Vista, I have an interesting issue. File in attached change set and put the following line in a workspace: BooleanArrayUser new loop and do-it couple of times. And, say, switch to web browser, surf some web and come back and try the cycle again. I get mustBeBoolean error in the #loop method eventually. In the debugger, the array often is displayed as: a BooleanArray(true true 1 1 1 1 1 1) when all of slots should be printed as 'true'. It is as if my version of #at: is bypassed and, say, right after GC or something like that. In a real "application", I get the error more consistently. -- Yoshiki BooleanArrayTest.1.cs (1K) Download Attachment |
On 4 January 2011 23:46, Yoshiki Ohshima <[hidden email]> wrote:
> Hi, Eliot and all, > > With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla > 4.2-10779 image on Windows Vista, I have an interesting issue. File > in attached change set and put the following line in a workspace: > > BooleanArrayUser new loop > > and do-it couple of times. And, say, switch to web browser, surf some > web and come back and try the cycle again. I get mustBeBoolean error > in the #loop method eventually. In the debugger, the array often is > displayed as: > > a BooleanArray(true true 1 1 1 1 1 1) > > when all of slots should be printed as 'true'. > > It is as if my version of #at: is bypassed and, say, right after GC or > something like that. > > In a real "application", I get the error more consistently. > I observed similar issues when references to some stable object(s) randomly flipped to small integers. My guess is that there are something broken, or probably interferes with GC mark phase. Because the only place where GC playing with references is during marking, and here is potential place, which could do that: ObjectMemory>>startField .... parentField := field - BytesPerWord bitOr: 1. ^ StartObj then in same method, just little above: typeBits = 0 ifTrue: "normal oop, go down" [self longAt: field put: parentField. parentField := field. ^ StartObj]. so, the place where it puts the small int is self longAt: field put: parentField. and parentField formed previously as (anything-something) bitOr: 1 , is small int. Somehow is not properly restored back, when object is fully traced. In this way, an instances of smallints may appear instead of valid references. > -- Yoshiki > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Igor Stasenko
Hi Igor, Hi All,
On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
Indeed, quite right. I happened to add a flag to turn off the heartbeat so I could debug the crash Matthew was seeing in starting up Squeak4.2-10856-beta.image (since single-stepping through machine code always gets interrupted by the heartbeat, it being an interval timer) and lo and behold the bug went away. This is very worrying because it appears to imply that there's a serious bug in the linux kernel/gcc since delivering a software interrupt shouldn't corrupt registers, but it clearly does. I'll try and pass it by someone who's an expert in this area.
Anyway, now find a new linux VM in VM.r2346/ that seems fine with the interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. Running this VM on CentOS 5.3 under Parallels I get
2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 unexpected passes for the full test suite in Squeak4.2-10856-beta.image. So have at it. best Eliot
|
On 11 January 2011 04:42, Eliot Miranda <[hidden email]> wrote:
> Hi Igor, Hi All, > > On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote: >> >> On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote: >> > Hi Martin, Hi All, >> > so find new VMs in VM.r2341/. The linux crashes (certainly the one >> > you >> > suffered from Martin) seem to be caused by an optimization bug (but they >> > could be caused by bad code generation, creating something that assumes >> > ordering constraints which C doesn't guarantee). I suspect the former >> > because I don't see the crash when running exactly the same VM and image >> > from a different directory; provoking the crash requires a particular >> > path >> > (go figure; I haven't pinned this down yet). >> > So my "fix" is preventing a complex function being inlined into the main >> > interpreter loop, removing the sources of some warnings, and lowering >> > the >> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my >> > build >> > environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this >> > "fix". I've violated the Deutsch criterion by not diagnosing the cause >> > of >> > the bug so I can't stand behind this fix; it's a hack that appears to >> > work >> > and may have merely pushed the real bug further underground. Alas I >> > don't >> > have time to do a better job. Hopefully it'll get those of you on linux >> > going again. >> >> Eliot, if you remember, i also had crash issues on linux, and pinned down >> it to >> removing optimization from <something>heartbeat.c while keeping >> gcc3x-cointerp.c to use >> same optimization flags as for rest of files. > > Indeed, quite right. I happened to add a flag to turn off the heartbeat so > I could debug the crash Matthew was seeing in starting up > Squeak4.2-10856-beta.image (since single-stepping through machine code > always gets interrupted by the heartbeat, it being an interval timer) and lo > and behold the bug went away. This is very worrying because it appears to > imply that there's a serious bug in the linux kernel/gcc since delivering a > software interrupt shouldn't corrupt registers, but it clearly does. I'll > try and pass it by someone who's an expert in this area. > Anyway, now find a new linux VM in VM.r2346/ that seems fine with the > interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. > Running this VM on CentOS 5.3 under Parallels I get > 2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 > unexpected passes > for the full test suite in Squeak4.2-10856-beta.image. > So have at it. Great! Nice to see that we can deal with elusive bugs :) > best > Eliot >> >> I will start coding cmake config for Cog during next week and will be >> able to check my previous >> observations again. >> >> > best >> > Eliot >> > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote: >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |