Re: [Pharo-project] new Cog VMs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

mkobetic
I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

        coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

        Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:
>    there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in
> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
>  & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.
>  These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo.  If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
Hi Martin,

    crashing immediately on startup could be to do with the UUIDPlugin and libuuid.  Try removing coglinux/lib/squeak/3.9-7/UUIDPlugin and see if it makes a difference.  If it does I believe the fix is to ensure that there's a 32-bit libuuid.so.1 installed.  e.g.

[eliot@mcqfes bld]$ ldd coglinux/lib/squeak/3.9-7/UUIDPlugin 
        linux-gate.so.1 =>  (0x00e2e000)
        libuuid.so.1 => /lib/libuuid.so.1 (0x0067e000)
        libc.so.6 => /lib/libc.so.6 (0x008d6000)
        /lib/ld-linux.so.2 (0x0045a000)
[eliot@mcqfes bld]$ file /lib/libuuid.so.1
/lib/libuuid.so.1: symbolic link to `libuuid.so.1.2'
[eliot@mcqfes bld]$ file /lib/libuuid.so.1.2
/lib/libuuid.so.1.2: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped

Search the Squeak, vm-dev and Pharo archives for info on how to install a 32-bit version.  I think it was mentioned within the last month.

If your problem is nothing to do with libuuid then I need info like exact OS version, what directories you've installed things in and a gdb backtrace for the crash.


HTH
Eliot

On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:
I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

       coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

       Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:
>    there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in
> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
>  & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.
>  These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo.  If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
In reply to this post by mkobetic
Hi Martin, Hi All,

    so find new VMs in VM.r2341/.  The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee).  I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this "fix".  I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground.  Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best
Eliot

On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:
I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

       coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

       Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:
>    there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in
> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
>  & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.
>  These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo.  If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

CdAB63
Em 02-01-2011 21:15, Eliot Miranda escreveu:
Hi Martin, Hi All,
Ok,

Today I downloaded new version of cog (svn) & compiled. Crashes are fixed.

unixbuild fails to install automatically (make install). Somewhere it execs squeak.sh which fails (no imaqe & no sources) & then the rest of installation process must be done by hand (make install-plugins; make install-doc; cp inisqueak squeak <cogsrc>/bin etc...).

Regards,

CdAB

    so find new VMs in VM.r2341/.  The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee).  I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this "fix".  I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground.  Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best
Eliot




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Igor Stasenko
In reply to this post by Eliot Miranda-2
On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:

> Hi Martin, Hi All,
>     so find new VMs in VM.r2341/.  The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee).  I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
> "fix".  I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground.  Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
 same optimization flags as for rest of files.

I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

mkobetic
In reply to this post by mkobetic
"Eliot Miranda"<[hidden email]> wrote:
>     so find new VMs in
> VM.r2341/<http://www.mirandabanda.org/files/Cog/VM/VM.r2341/>.

Yup, much better on my end. Thanks!

Martin

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
In reply to this post by CdAB63
Hi Casimiro,

On Sun, Jan 2, 2011 at 4:23 PM, Casimiro de Almeida Barreto <[hidden email]> wrote:
Em 02-01-2011 21:15, Eliot Miranda escreveu:
Hi Martin, Hi All,
Ok,

Today I downloaded new version of cog (svn) & compiled. Crashes are fixed.

unixbuild fails to install automatically (make install). Somewhere it execs squeak.sh which fails (no imaqe & no sources) & then the rest of installation process must be done by hand (make install-plugins; make install-doc; cp inisqueak squeak <cogsrc>/bin etc...).

That should be fixed now.  I had applied only half of the suggested fix to installing shell files.  But the current svn sources contain both halves.

cheers
Eliot 

Regards,

CdAB


    so find new VMs in VM.r2341/.  The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee).  I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this "fix".  I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground.  Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best
Eliot








Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
In reply to this post by Igor Stasenko


On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All,
>     so find new VMs in VM.r2341/.  The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee).  I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
> "fix".  I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground.  Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
 same optimization flags as for rest of files.

I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

That would be really great.  The crash Martin and others was seeing was a) in evaluating Process>priority: #>= ended up getting sent to the Processor and in the subsequent attempt to raise a notifier for the doesNotUnderstand: the VM crashed in MethodContext>>tempNames.  I only saw this crash using the Pharo1.1 one click installed in /pub/mkobetic/st/pharo /and/ the VM installed in  /pub/mkobetic/st/pharo/coglinux.  This was compiled with gcc 4.1.2, -O2.


> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: new Cog VMs

Yoshiki Ohshima-2
In reply to this post by Eliot Miranda-2
  Hi, Eliot and all,

With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla
4.2-10779 image on Windows Vista, I have an interesting issue.  File
in attached change set and put the following line in a workspace:

BooleanArrayUser new loop

and do-it couple of times.  And, say, switch to web browser, surf some
web and come back and try the cycle again.  I get mustBeBoolean error
in the #loop method eventually.  In the debugger, the array often is
displayed as:

a BooleanArray(true true 1 1 1 1 1 1)

when all of slots should be printed as 'true'.

It is as if my version of #at: is bypassed and, say, right after GC or
something like that.

In a real "application", I get the error more consistently.

-- Yoshiki



BooleanArrayTest.1.cs (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new Cog VMs

Igor Stasenko
On 4 January 2011 23:46, Yoshiki Ohshima <[hidden email]> wrote:

>  Hi, Eliot and all,
>
> With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla
> 4.2-10779 image on Windows Vista, I have an interesting issue.  File
> in attached change set and put the following line in a workspace:
>
> BooleanArrayUser new loop
>
> and do-it couple of times.  And, say, switch to web browser, surf some
> web and come back and try the cycle again.  I get mustBeBoolean error
> in the #loop method eventually.  In the debugger, the array often is
> displayed as:
>
> a BooleanArray(true true 1 1 1 1 1 1)
>
> when all of slots should be printed as 'true'.
>
> It is as if my version of #at: is bypassed and, say, right after GC or
> something like that.
>
> In a real "application", I get the error more consistently.
>

I observed similar issues when references to some stable object(s)
randomly flipped to small integers.
My guess is that there are something broken, or probably interferes
with GC mark phase. Because the only place where
GC playing with references is during marking, and here is potential
place, which could do that:

ObjectMemory>>startField
    ....
        parentField := field - BytesPerWord bitOr: 1.
        ^ StartObj


then in same method, just little above:

        typeBits = 0 ifTrue: "normal oop, go down"
                [self longAt: field put: parentField.
                parentField := field.
                ^ StartObj].

so, the place where it puts the small int is
  self longAt: field put: parentField.

and parentField formed previously as (anything-something) bitOr: 1 ,
is small int.

Somehow is not properly restored back, when object is fully traced.
In this way, an instances of smallints may appear instead of valid references.

> -- Yoshiki
>


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Eliot Miranda-2
In reply to this post by Igor Stasenko
Hi Igor,  Hi All,

On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All,
>     so find new VMs in VM.r2341/.  The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee).  I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
> "fix".  I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground.  Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
 same optimization flags as for rest of files.

Indeed, quite right.  I happened to add a flag to turn off the heartbeat so I could debug the crash Matthew was seeing in starting up Squeak4.2-10856-beta.image (since single-stepping through machine code always gets interrupted by the heartbeat, it being an interval timer) and lo and behold the bug went away.  This is very worrying because it appears to imply that there's a serious bug in the linux kernel/gcc since delivering a software interrupt shouldn't corrupt registers, but it clearly does.  I'll try and pass it by someone who's an expert in this area.

Anyway, now find a new linux VM in VM.r2346/ that seems fine with the interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2.  Running this VM on CentOS 5.3 under Parallels I get
    2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 unexpected passes
for the full test suite in Squeak4.2-10856-beta.image.

So have at it.

best
Eliot


I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] new Cog VMs

Igor Stasenko
On 11 January 2011 04:42, Eliot Miranda <[hidden email]> wrote:

> Hi Igor,  Hi All,
>
> On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
>> > Hi Martin, Hi All,
>> >     so find new VMs in VM.r2341/.  The linux crashes (certainly the one
>> > you
>> > suffered from Martin) seem to be caused by an optimization bug (but they
>> > could be caused by bad code generation, creating something that assumes
>> > ordering constraints which C doesn't guarantee).  I suspect the former
>> > because I don't see the crash when running exactly the same VM and image
>> > from a different directory; provoking the crash requires a particular
>> > path
>> > (go figure; I haven't pinned this down yet).
>> > So my "fix" is preventing a complex function being inlined into the main
>> > interpreter loop, removing the sources of some warnings, and lowering
>> > the
>> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my
>> > build
>> > environment, CentOS Linux 5.3, uses gcc 4.1.2).  I'm not proud of this
>> > "fix".  I've violated the Deutsch criterion by not diagnosing the cause
>> > of
>> > the bug so I can't stand behind this fix; it's a hack that appears to
>> > work
>> > and may have merely pushed the real bug further underground.  Alas I
>> > don't
>> > have time to do a better job. Hopefully it'll get those of you on linux
>> > going again.
>>
>> Eliot, if you remember, i also had crash issues on linux, and pinned down
>> it to
>> removing optimization from <something>heartbeat.c while keeping
>> gcc3x-cointerp.c to use
>>  same optimization flags as for rest of files.
>
> Indeed, quite right.  I happened to add a flag to turn off the heartbeat so
> I could debug the crash Matthew was seeing in starting up
> Squeak4.2-10856-beta.image (since single-stepping through machine code
> always gets interrupted by the heartbeat, it being an interval timer) and lo
> and behold the bug went away.  This is very worrying because it appears to
> imply that there's a serious bug in the linux kernel/gcc since delivering a
> software interrupt shouldn't corrupt registers, but it clearly does.  I'll
> try and pass it by someone who's an expert in this area.
> Anyway, now find a new linux VM in VM.r2346/ that seems fine with the
> interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2.
>  Running this VM on CentOS 5.3 under Parallels I get
>     2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0
> unexpected passes
> for the full test suite in Squeak4.2-10856-beta.image.
> So have at it.

Great! Nice to see that we can deal with elusive bugs :)

> best
> Eliot
>>
>> I will start coding cmake config for Cog during next week and will be
>> able to check my previous
>> observations again.
>>
>> > best
>> > Eliot
>> > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>



--
Best regards,
Igor Stasenko AKA sig.