Smalltalk › Squeak › Squeak - Dev

Re: [Pharo-project] new Cog VMs

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

12 messages Options

mkobetic

Re: [Pharo-project] new Cog VMs

I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:
> there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in
> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
> & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.
> These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo. If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.

Eliot Miranda-2

Re: [Pharo-project] new Cog VMs

Hi Martin,

crashing immediately on startup could be to do with the UUIDPlugin and libuuid. Try removing coglinux/lib/squeak/3.9-7/UUIDPlugin and see if it makes a difference. If it does I believe the fix is to ensure that there's a 32-bit libuuid.so.1 installed. e.g.

[eliot@mcqfes bld]$ ldd coglinux/lib/squeak/3.9-7/UUIDPlugin

linux-gate.so.1 => (0x00e2e000)

libuuid.so.1 => /lib/libuuid.so.1 (0x0067e000)

libc.so.6 => /lib/libc.so.6 (0x008d6000)

/lib/ld-linux.so.2 (0x0045a000)

[eliot@mcqfes bld]$ file /lib/libuuid.so.1

/lib/libuuid.so.1: symbolic link to `libuuid.so.1.2'

[eliot@mcqfes bld]$ file /lib/libuuid.so.1.2

/lib/libuuid.so.1.2: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped

Search the Squeak, vm-dev and Pharo archives for info on how to install a 32-bit version. I think it was mentioned within the last month.

If your problem is nothing to do with libuuid then I need info like exact OS version, what directories you've installed things in and a gdb backtrace for the crash.

HTH

Eliot

On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:

> there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in

> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
> & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.

> These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo. If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.

Eliot Miranda-2

Re: [Pharo-project] new Cog VMs

In reply to this post by mkobetic

Hi Martin, Hi All,

so find new VMs in VM.r2341/. The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee). I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this "fix". I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground. Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best

Eliot

On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

I'm probably doing something obviously wrong, but I have no luck with the Cog VM. It crashes immediately on startup even with the stock OneClick image. I downloaded the Pharo-1.1.1 OneClick images. Fetched the latest coglinux.tgz (r2340), untarred it into the pharo directory and just trying to run it from the top-level pharo directory as:

coglinux/squeak Contents/Resources/pharo.image

crashes immediately (with or without the -vm-display-X11 option). This is on latest Fedora 14. The old VM seems to have no problem when I try to run it the same way:

Contents/Linux/squeakvm Contents/Resources/pharo.image

seems to start just fine. I can copy the stack dump from the crash, but since most seem to be running fine, I suspect I'm just not doing it right. What am I missing ?

Thanks,

Martin

"Eliot Miranda"<[hidden email]> wrote:

> there are new versions of both the SimpleStackBasedCogit and the
> StackToRegisterMappingCogit Cog VMs in

> VM.r2339/<http://www.mirandabanda.org/files/Cog/VM/VM.r2339/>
> & VM.r2340/ <http://www.mirandabanda.org/files/Cog/VM/VM.r2340/> respectively.

> These contain fixes for rounding bug causing underestimate of openPICSize
> and resultant hard crashes, seen e.g. by trying to recover lost changes in a
> Pharo 1.2 image installed on c:\pharo. If you're trying to reproduce Cog
> crashes please upgrade to one of tthese two VMs.

CdAB63

Re: [Pharo-project] new Cog VMs

Em 02-01-2011 21:15, Eliot Miranda escreveu:

Hi Martin, Hi All,

Ok,

Today I downloaded new version of cog (svn) & compiled. Crashes are fixed.

unixbuild fails to install automatically (make install). Somewhere it execs squeak.sh which fails (no imaqe & no sources) & then the rest of installation process must be done by hand (make install-plugins; make install-doc; cp inisqueak squeak <cogsrc>/bin etc...).

Regards,

CdAB

so find new VMs in VM.r2341/. The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee). I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this "fix". I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground. Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best

Eliot

Igor Stasenko

Re: [Pharo-project] new Cog VMs

In reply to this post by Eliot Miranda-2

On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:

> Hi Martin, Hi All,
> so find new VMs in VM.r2341/. The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee). I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this
> "fix". I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground. Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
same optimization flags as for rest of files.

I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.

mkobetic

Re: [Pharo-project] new Cog VMs

In reply to this post by mkobetic

"Eliot Miranda"<[hidden email]> wrote:
> so find new VMs in
> VM.r2341/<http://www.mirandabanda.org/files/Cog/VM/VM.r2341/>.

Yup, much better on my end. Thanks!

Martin

Eliot Miranda-2

Re: [Pharo-project] new Cog VMs

In reply to this post by CdAB63

Hi Casimiro,

On Sun, Jan 2, 2011 at 4:23 PM, Casimiro de Almeida Barreto <[hidden email]> wrote:

Em 02-01-2011 21:15, Eliot Miranda escreveu:
Hi Martin, Hi All,
Ok,

Today I downloaded new version of cog (svn) & compiled. Crashes are fixed.

unixbuild fails to install automatically (make install). Somewhere it execs squeak.sh which fails (no imaqe & no sources) & then the rest of installation process must be done by hand (make install-plugins; make install-doc; cp inisqueak squeak <cogsrc>/bin etc...).

That should be fixed now. I had applied only half of the suggested fix to installing shell files. But the current svn sources contain both halves.

cheers

Eliot

Regards,

CdAB

so find new VMs in VM.r2341/. The linux crashes (certainly the one you suffered from Martin) seem to be caused by an optimization bug (but they could be caused by bad code generation, creating something that assumes ordering constraints which C doesn't guarantee). I suspect the former because I don't see the crash when running exactly the same VM and image from a different directory; provoking the crash requires a particular path (go figure; I haven't pinned this down yet).

So my "fix" is preventing a complex function being inlined into the main interpreter loop, removing the sources of some warnings, and lowering the optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this "fix". I've violated the Deutsch criterion by not diagnosing the cause of the bug so I can't stand behind this fix; it's a hack that appears to work and may have merely pushed the real bug further underground. Alas I don't have time to do a better job. Hopefully it'll get those of you on linux going again.

best

Eliot

Eliot Miranda-2

Re: [Pharo-project] new Cog VMs

In reply to this post by Igor Stasenko

On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:

On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All,
> so find new VMs in VM.r2341/. The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee). I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this
> "fix". I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground. Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
same optimization flags as for rest of files.

I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

That would be really great. The crash Martin and others was seeing was a) in evaluating Process>priority: #>= ended up getting sent to the Processor and in the subsequent attempt to raise a notifier for the doesNotUnderstand: the VM crashed in MethodContext>>tempNames. I only saw this crash using the Pharo1.1 one click installed in /pub/mkobetic/st/pharo /and/ the VM installed in /pub/mkobetic/st/pharo/coglinux. This was compiled with gcc 4.1.2, -O2.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.

Yoshiki Ohshima-2

Re: new Cog VMs

In reply to this post by Eliot Miranda-2

Hi, Eliot and all,

With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla
4.2-10779 image on Windows Vista, I have an interesting issue. File
in attached change set and put the following line in a workspace:

BooleanArrayUser new loop

and do-it couple of times. And, say, switch to web browser, surf some
web and come back and try the cycle again. I get mustBeBoolean error
in the #loop method eventually. In the debugger, the array often is
displayed as:

a BooleanArray(true true 1 1 1 1 1 1)

when all of slots should be printed as 'true'.

It is as if my version of #at: is bypassed and, say, right after GC or
something like that.

In a real "application", I get the error more consistently.

-- Yoshiki

BooleanArrayTest.1.cs (1K) Download Attachment

Igor Stasenko

Re: new Cog VMs

On 4 January 2011 23:46, Yoshiki Ohshima <[hidden email]> wrote:

> Hi, Eliot and all,
>
> With Cog VM.r2341 (Cog VM 4.0.0 (release) from Jan 2 2011) and vanilla
> 4.2-10779 image on Windows Vista, I have an interesting issue. File
> in attached change set and put the following line in a workspace:
>
> BooleanArrayUser new loop
>
> and do-it couple of times. And, say, switch to web browser, surf some
> web and come back and try the cycle again. I get mustBeBoolean error
> in the #loop method eventually. In the debugger, the array often is
> displayed as:
>
> a BooleanArray(true true 1 1 1 1 1 1)
>
> when all of slots should be printed as 'true'.
>
> It is as if my version of #at: is bypassed and, say, right after GC or
> something like that.
>
> In a real "application", I get the error more consistently.
>

I observed similar issues when references to some stable object(s)
randomly flipped to small integers.
My guess is that there are something broken, or probably interferes
with GC mark phase. Because the only place where
GC playing with references is during marking, and here is potential
place, which could do that:

ObjectMemory>>startField
....
parentField := field - BytesPerWord bitOr: 1.
^ StartObj

then in same method, just little above:

typeBits = 0 ifTrue: "normal oop, go down"
[self longAt: field put: parentField.
parentField := field.
^ StartObj].

so, the place where it puts the small int is
self longAt: field put: parentField.

and parentField formed previously as (anything-something) bitOr: 1 ,
is small int.

Somehow is not properly restored back, when object is fully traced.
In this way, an instances of smallints may appear instead of valid references.

> -- Yoshiki
>

--
Best regards,
Igor Stasenko AKA sig.

Eliot Miranda-2

Re: [Pharo-project] new Cog VMs

In reply to this post by Igor Stasenko

Hi Igor, Hi All,

On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:

On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
> Hi Martin, Hi All,
> so find new VMs in VM.r2341/. The linux crashes (certainly the one you
> suffered from Martin) seem to be caused by an optimization bug (but they
> could be caused by bad code generation, creating something that assumes
> ordering constraints which C doesn't guarantee). I suspect the former
> because I don't see the crash when running exactly the same VM and image
> from a different directory; provoking the crash requires a particular path
> (go figure; I haven't pinned this down yet).
> So my "fix" is preventing a complex function being inlined into the main
> interpreter loop, removing the sources of some warnings, and lowering the
> optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my build
> environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this
> "fix". I've violated the Deutsch criterion by not diagnosing the cause of
> the bug so I can't stand behind this fix; it's a hack that appears to work
> and may have merely pushed the real bug further underground. Alas I don't
> have time to do a better job. Hopefully it'll get those of you on linux
> going again.

Eliot, if you remember, i also had crash issues on linux, and pinned down it to
removing optimization from <something>heartbeat.c while keeping
gcc3x-cointerp.c to use
same optimization flags as for rest of files.

Indeed, quite right. I happened to add a flag to turn off the heartbeat so I could debug the crash Matthew was seeing in starting up Squeak4.2-10856-beta.image (since single-stepping through machine code always gets interrupted by the heartbeat, it being an interval timer) and lo and behold the bug went away. This is very worrying because it appears to imply that there's a serious bug in the linux kernel/gcc since delivering a software interrupt shouldn't corrupt registers, but it clearly does. I'll try and pass it by someone who's an expert in this area.

Anyway, now find a new linux VM in VM.r2346/ that seems fine with the interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. Running this VM on CentOS 5.3 under Parallels I get

2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 unexpected passes

for the full test suite in Squeak4.2-10856-beta.image.

So have at it.

best

Eliot

I will start coding cmake config for Cog during next week and will be
able to check my previous
observations again.

> best
> Eliot
> On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:

--
Best regards,
Igor Stasenko AKA sig.

Igor Stasenko

Re: [Pharo-project] new Cog VMs

On 11 January 2011 04:42, Eliot Miranda <[hidden email]> wrote:

> Hi Igor, Hi All,
>
> On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 3 January 2011 00:15, Eliot Miranda <[hidden email]> wrote:
>> > Hi Martin, Hi All,
>> > so find new VMs in VM.r2341/. The linux crashes (certainly the one
>> > you
>> > suffered from Martin) seem to be caused by an optimization bug (but they
>> > could be caused by bad code generation, creating something that assumes
>> > ordering constraints which C doesn't guarantee). I suspect the former
>> > because I don't see the crash when running exactly the same VM and image
>> > from a different directory; provoking the crash requires a particular
>> > path
>> > (go figure; I haven't pinned this down yet).
>> > So my "fix" is preventing a complex function being inlined into the main
>> > interpreter loop, removing the sources of some warnings, and lowering
>> > the
>> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my
>> > build
>> > environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this
>> > "fix". I've violated the Deutsch criterion by not diagnosing the cause
>> > of
>> > the bug so I can't stand behind this fix; it's a hack that appears to
>> > work
>> > and may have merely pushed the real bug further underground. Alas I
>> > don't
>> > have time to do a better job. Hopefully it'll get those of you on linux
>> > going again.
>>
>> Eliot, if you remember, i also had crash issues on linux, and pinned down
>> it to
>> removing optimization from <something>heartbeat.c while keeping
>> gcc3x-cointerp.c to use
>> same optimization flags as for rest of files.
>
> Indeed, quite right. I happened to add a flag to turn off the heartbeat so
> I could debug the crash Matthew was seeing in starting up
> Squeak4.2-10856-beta.image (since single-stepping through machine code
> always gets interrupted by the heartbeat, it being an interval timer) and lo
> and behold the bug went away. This is very worrying because it appears to
> imply that there's a serious bug in the linux kernel/gcc since delivering a
> software interrupt shouldn't corrupt registers, but it clearly does. I'll
> try and pass it by someone who's an expert in this area.
> Anyway, now find a new linux VM in VM.r2346/ that seems fine with the
> interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2.
> Running this VM on CentOS 5.3 under Parallels I get
> 2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0
> unexpected passes
> for the full test suite in Squeak4.2-10856-beta.image.
> So have at it.

Great! Nice to see that we can deal with elusive bugs :)

> best
> Eliot
>>
>> I will start coding cmake config for Cog during next week and will be
>> able to check my previous
>> observations again.
>>
>> > best
>> > Eliot
>> > On Sat, Jan 1, 2011 at 11:16 PM, <[hidden email]> wrote:
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>

--
Best regards,
Igor Stasenko AKA sig.