About Cog on linux

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

About Cog on linux

Igor Stasenko
 
It is really weird..
i using same sources for everything, except in one case i use
usual build steps, as in HowToBuild file,

while in another i using cmake configs.

So, the difference is in compiler flags etc.. but not in sources..
and as result, the one built using 'standard' steps runs well,
while built by cmake configs dies after few bytecodes interpreted:

 ./Cog ../generator.image

Segmentation fault



Smalltalk stack dump:
0xbfd61804 I [] in Debugger class>openContext:label:contents:
2005660708: a(n) Debugger class
0xbfd61830 I [] in StandardToolSet class>debugContext:label:contents:
2004064772: a(n) StandardToolSet class
0xbfd6185c I [] in ToolSet class>debugContext:label:contents:
2004049368: a(n) ToolSet class
0xbfd61888 I [] in MethodContext>cannotReturn: 2032713528: a(n) MethodContext
0xbfd618ac I [] in MessageNotUnderstood>message: 2032713492: a(n)
MessageNotUnderstood
0xbfd618dc I [] in UndefinedObject(Object)>doesNotUnderstand:
2003124228: a(n) UndefinedObject
0xbfd61904 I [] in UndefinedObject(Object)>mustBeBooleanIn:
2003124228: a(n) UndefinedObject
0xbfd61928 I UndefinedObject(Object)>mustBeBoolean 2003124228: a(n)
UndefinedObject
0xbfd61954 I [] in SmalltalkImage>snapshot:andQuit:embedded:
2005393456: a(n) SmalltalkImage
2032108624 s SmalltalkImage>snapshot:andQuit:
2032108320 s SmalltalkImage>saveImageInFileNamed:
2032108228 s SmalltalkImage>saveAs:
2016984776 s UndefinedObject>?
2016978028 s Compiler>evaluate:in:to:notifying:ifFail:logged:
2016977456 s Compiler class>evaluate:for:notifying:logged:
2016977364 s Compiler class>evaluate:for:logged:
2016977236 s Compiler class>evaluate:logged:
2016977144 s [] in RWBinaryOrTextStream(PositionableStream)>fileInAnnouncing:
2016977052 s BlockClosure>on:do:


as you can see it even don't leaves the #snapshot:andQuit:embedded:

Eliot, has you seen such before? Any ideas?
I am really don't like having VM which stability depends on some
little flag(s).. and i guess you too.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Eliot Miranda-2
 


On Wed, Feb 9, 2011 at 10:48 AM, Igor Stasenko <[hidden email]> wrote:

It is really weird..
i using same sources for everything, except in one case i use
usual build steps, as in HowToBuild file,

while in another i using cmake configs.

So, the difference is in compiler flags etc.. but not in sources..
and as result, the one built using 'standard' steps runs well,
while built by cmake configs dies after few bytecodes interpreted:

 ./Cog ../generator.image

Segmentation fault



Smalltalk stack dump:
0xbfd61804 I [] in Debugger class>openContext:label:contents:
2005660708: a(n) Debugger class
0xbfd61830 I [] in StandardToolSet class>debugContext:label:contents:
2004064772: a(n) StandardToolSet class
0xbfd6185c I [] in ToolSet class>debugContext:label:contents:
2004049368: a(n) ToolSet class
0xbfd61888 I [] in MethodContext>cannotReturn: 2032713528: a(n) MethodContext
0xbfd618ac I [] in MessageNotUnderstood>message: 2032713492: a(n)
MessageNotUnderstood
0xbfd618dc I [] in UndefinedObject(Object)>doesNotUnderstand:
2003124228: a(n) UndefinedObject
0xbfd61904 I [] in UndefinedObject(Object)>mustBeBooleanIn:
2003124228: a(n) UndefinedObject
0xbfd61928 I UndefinedObject(Object)>mustBeBoolean 2003124228: a(n)
UndefinedObject
0xbfd61954 I [] in SmalltalkImage>snapshot:andQuit:embedded:
2005393456: a(n) SmalltalkImage
2032108624 s SmalltalkImage>snapshot:andQuit:
2032108320 s SmalltalkImage>saveImageInFileNamed:
2032108228 s SmalltalkImage>saveAs:
2016984776 s UndefinedObject>?
2016978028 s Compiler>evaluate:in:to:notifying:ifFail:logged:
2016977456 s Compiler class>evaluate:for:notifying:logged:
2016977364 s Compiler class>evaluate:for:logged:
2016977236 s Compiler class>evaluate:logged:
2016977144 s [] in RWBinaryOrTextStream(PositionableStream)>fileInAnnouncing:
2016977052 s BlockClosure>on:do:


as you can see it even don't leaves the #snapshot:andQuit:embedded:

Eliot, has you seen such before?

That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.



 
Any ideas?

If you can get cmake to run with the same flags as configure (!making sure to use -save-temps so we can look at generated assembler and object files!) and you can get one or other to crash on start-up then one can compare the two and hopefully find the elusive bug.

 
I am really don't like having VM which stability depends on some
little flag(s).. and i guess you too.

Damn right!!  Except it /doesn't/ depend on the optimization.  It is more subtle than that.  For example there was one build which Martin Kobetic found would crash on startup if the image path was something like /st/squeak/cog/myimage.image but not if it was /some/network/drive/and/hence/much/longer/myimage.image.
 

confused, bewildered and disquieted,
Eliot


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko

On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>
>
> That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>
>

I tried today with debug info enabled (all source files are compiled with:

compilerFlags

        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'

)


sig@sig-VirtualBox:~/vmbuild/build$ ./results/Cog  ./generator.image
ioFindExternalFunctionIn(display_X11, 0x676918):
  ./results/Cog: undefined symbol: display_X11
ioFindExternalFunctionIn(sound_OSS, 0x676918):
  ./results/Cog: undefined symbol: sound_OSS
ioFindExternalFunctionIn(sound_MacOSX, 0x676918):
  ./results/Cog: undefined symbol: sound_MacOSX
ioFindExternalFunctionIn(sound_Sun, 0x676918):
  ./results/Cog: undefined symbol: sound_Sun
ioFindExternalFunctionIn(sound_pulse, 0x676918):
  ./results/Cog: undefined symbol: sound_pulse
ioFindExternalFunctionIn(sound_ALSA, 0x676918):
  ./results/Cog: undefined symbol: sound_ALSA
ioLoadModule(/home/sig/vmbuild/build/results/vm-sound-ALSA):
  /home/sig/vmbuild/build/results/vm-sound-ALSA: undefined symbol:
snd_mixer_selem_has_playback_volume
ioFindExternalFunctionIn(sound_null, 0x676918):
  ./results/Cog: undefined symbol: sound_null
uxAllocateMemory: pageSize 0x1000 (4096), mask 0xfffff000
uxAllocateMemory: /dev/zero descriptor -1
uxAllocateMemory: min heap 39664660, desired 58539092
uxAllocateMemory: mapping 0x40000000 bytes (1024 Mbytes)

(frameNumArgs((page->baseFP))) == numArgs 19611

((((aPage->baseFP)) + (frameStackedReceiverOffset((aPage->baseFP)))) +
(2 * BytesPerWord)) == ((aPage->baseAddress)) 43641

validStackPageBaseFrames() 19637

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

Segmentation fault



Smalltalk stack dump:
0xbfeb4f6c I [] in MessageNotUnderstood>message: 2032401548: a(n)
MessageNotUnderstood
0xbfeb4f9c I [] in UndefinedObject(Object)>doesNotUnderstand:
2004271108: a(n) UndefinedObject
0xbfeb4fc4 I [] in UndefinedObject(Object)>mustBeBooleanIn:
2004271108: a(n) UndefinedObject
0xbfeb4fe8 I UndefinedObject(Object)>mustBeBoolean 2004271108: a(n)
UndefinedObject
0xbfeb5014 I [] in SmalltalkImage>snapshot:andQuit:embedded:
2006539444: a(n) SmalltalkImage
2032008288 s SmalltalkImage>snapshot:andQuit:
2032008176 s WorldState class>saveAndQuit
2032008064 s [] in ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2032007972 s BlockClosure>ensure:
2032007880 s CursorWithMask(Cursor)>showWhile:
2032007676 s ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2032007584 s ToggleMenuItemMorph(MenuItemMorph)>mouseUp:
2032007492 s ToggleMenuItemMorph(MenuItemMorph)>handleMouseUp:
2032007356 s MouseButtonEvent>sentTo:
2032007264 s ToggleMenuItemMorph(Morph)>handleEvent:
2032007172 s MorphicEventDispatcher>dispatchDefault:with:
2032007080 s MorphicEventDispatcher>dispatchEvent:with:
2032006988 s ToggleMenuItemMorph(Morph)>processEvent:using:
2032006896 s MorphicEventDispatcher>dispatchDefault:with:
2032006788 s MorphicEventDispatcher>dispatchEvent:with:
2032006652 s MenuMorph(Morph)>processEvent:using:
2032006560 s MenuMorph(Morph)>processEvent:
2032006468 s MenuMorph>handleFocusEvent:
2032006376 s [] in HandMorph>sendFocusEvent:to:clear:
2032006284 s [] in PasteUpMorph>becomeActiveDuring:
2032006116 s BlockClosure>on:do:
2032005932 s PasteUpMorph>becomeActiveDuring:
2032005840 s HandMorph>sendFocusEvent:to:clear:
2032005748 s HandMorph>sendEvent:focus:clear:
2032005656 s HandMorph>sendMouseEvent:
2032005564 s HandMorph>handleEvent:
2032005288 s HandMorph>processEvents
2032005196 s [] in WorldState>doOneCycleNowFor:
2032005104 s Array(SequenceableCollection)>do:
2032005012 s WorldState>handsDo:
2032004920 s WorldState>doOneCycleNowFor:
2032004792 s WorldState>doOneCycleFor:
2032004700 s PasteUpMorph>doOneCycle
2006915672 s [] in Project class>?
2006915544 s [] in BlockClosure>?

Most recent primitives
basicNew
Aborted


So, it looks like something are not initialized correctly..

>
>>
>> Any ideas?
>
> If you can get cmake to run with the same flags as configure (!making sure to use -save-temps so we can look at generated assembler and object files!) and you can get one or other to crash on start-up then one can compare the two and hopefully find the elusive bug.
>

I just uploaded a fix for config which can build Cog VM:

CMakeVMMaker-IgorStasenko.24

load it into your image and issue

CogUnixConfig generate

or

CogDebugUnixConfig generate



it will generate cmake files in ../build directory (relative to
image's current dir).
(also you can use  #generateWithSources to generate VMMaker sources
along with build config)


cd to it, and do
cmake . && make

in results subdir you will find the built artifacts.


>>
>> I am really don't like having VM which stability depends on some
>> little flag(s).. and i guess you too.
>
> Damn right!!  Except it /doesn't/ depend on the optimization.  It is more subtle than that.  For example there was one build which Martin Kobetic found would crash on startup if the image path was something like /st/squeak/cog/myimage.image but not if it was /some/network/drive/and/hence/much/longer/myimage.image.
>
> confused, bewildered and disquieted,
> Eliot
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko
 
Ok, i synced up with your last update and managed to build Stack VM..
same problem:


 ./Cog -version
3.9-7 #1 Wed Feb  9 18:28:30 CET 2011 gcc 4.4.5
Croquet Closure Stack VM [StackInterpreter
VMMaker-oscog-IgorStasenko.Stasenko.49]
Linux sig-VirtualBox 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19
20:34:50 UTC 2010 i686 GNU/Linux
plugin path: /home/sig/vmbuild/build/results/ [default:
/home/sig/vmbuild/build/results/]

~/vmbuild/build/results$ ./Cog ../generator.image

Segmentation fault

C stack backtrace:
./Cog(error+0x50)[0x8081840]
[0xc61400]
./Cog[0x8072bf8]
./Cog(main+0x2a3)[0x80822c3]
/lib/libc.so.6(__libc_start_main+0xe7)[0x4b4ce7]
./Cog[0x805cc71]


Smalltalk stack dump:
0xbff23ea4 [] in MethodContext(ContextPart)>resume: 2031729932: a(n)
MethodContext
0xbff23ed0 [] in UndefinedObject(Object)>doesNotUnderstand:
2003599364: a(n) UndefinedObject
0xbff23ef4 [] in UndefinedObject(Object)>mustBeBooleanIn: 2003599364:
a(n) UndefinedObject
0xbff23f14 UndefinedObject(Object)>mustBeBoolean 2003599364: a(n)
UndefinedObject
0xbff23f3c [] in SmalltalkImage>snapshot:andQuit:embedded: 2005867700:
a(n) SmalltalkImage
2031336544 s SmalltalkImage>snapshot:andQuit:
2031336432 s WorldState class>saveAndQuit
2031336320 s [] in ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2031336228 s BlockClosure>ensure:
2031336136 s CursorWithMask(Cursor)>showWhile:
2031335932 s ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2031335840 s ToggleMenuItemMorph(MenuItemMorph)>mouseUp:
2031335748 s ToggleMenuItemMorph(MenuItemMorph)>handleMouseUp:
2031335612 s MouseButtonEvent>sentTo:
2031335520 s ToggleMenuItemMorph(Morph)>handleEvent:
2031335428 s MorphicEventDispatcher>dispatchDefault:with:
2031335336 s MorphicEventDispatcher>dispatchEvent:with:
2031335244 s ToggleMenuItemMorph(Morph)>processEvent:using:
2031335152 s MorphicEventDispatcher>dispatchDefault:with:
2031335044 s MorphicEventDispatcher>dispatchEvent:with:
2031334908 s MenuMorph(Morph)>processEvent:using:
2031334816 s MenuMorph(Morph)>processEvent:
2031334724 s MenuMorph>handleFocusEvent:
2031334632 s [] in HandMorph>sendFocusEvent:to:clear:
2031334540 s [] in PasteUpMorph>becomeActiveDuring:
2031334372 s BlockClosure>on:do:
2031334188 s PasteUpMorph>becomeActiveDuring:
2031334096 s HandMorph>sendFocusEvent:to:clear:
2031334004 s HandMorph>sendEvent:focus:clear:
2031333912 s HandMorph>sendMouseEvent:
2031333820 s HandMorph>handleEvent:
2031333544 s HandMorph>processEvents
2031333452 s [] in WorldState>doOneCycleNowFor:
2031333360 s Array(SequenceableCollection)>do:
2031333268 s WorldState>handsDo:
2031333176 s WorldState>doOneCycleNowFor:
2031333048 s WorldState>doOneCycleFor:
2031332956 s PasteUpMorph>doOneCycle
2006243928 s [] in Project class>?
2006243800 s [] in BlockClosure>?

Most recent primitives
Aborted

-----------

I will upload the config shortly.

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko
 
Okay.. i started the debug session with StackVM /Debug build ..

Program received signal SIGSEGV, Segmentation fault.
0x0805dc1e in isContext (oop=0) at /home/sig/vmbuild/src/vm/gcc3x-interp.c:14374
Line number 14374 out of range;
/home/sig/vmbuild/src/vm/gcc3x-interp.c has 1 lines.
(gdb) bt
#0  0x0805dc1e in isContext (oop=0) at
/home/sig/vmbuild/src/vm/gcc3x-interp.c:14374
#1  0x08064dd9 in findMethodWithPrimitiveFromContextUpToContext
(primitive=<value optimized out>, senderContext=2040211164,
homeContext=<value optimized out>)
    at /home/sig/vmbuild/src/vm/gcc3x-interp.c:11733
#2  0x08064f4c in findMethodWithPrimitiveFromFPUpToContext
(primitive=<value optimized out>, startFP=<value optimized out>,
homeContext=<value optimized out>)
    at /home/sig/vmbuild/src/vm/gcc3x-interp.c:11808
#3  0x08074a4e in L0writeBackHeadFramePointers () at
/home/sig/vmbuild/src/vm/gcc3x-interp.c:3763
#4  0x080783d6 in initStackPagesAndInterpret () at
/home/sig/vmbuild/src/vm/gcc3x-interp.c:13895
#5  0x08072f99 in interpret () at /home/sig/vmbuild/src/vm/gcc3x-interp.c:1692
#6  0x080836e4 in main (argc=2, argv=0xbffff464, envp=0xbffff470) at
/home/sig/vmbuild/platforms/unix/vm/sqUnixMain.c:1659


oop=0 looks quite strange..
i suspecting there are some proper initialization missing, because it
is triggered so fast, almost at the start of code running.


And since error triggered in #snapshot:andQuit:
i suspecting there is something odd with image resuming. the image
expecting that primitive which resuming image should push either true
"means resuming" or false on stack..
and instead it pushing nil..
and then since there is a branch expecting boolean, no wonder that it
sends #mustBeBoolean, which then triggers the error...

Alas... i was thinking that Pharo recent modification to
#snapshot:andQuit:..  causing the error . But then i tried Squeak
image.. and got similar results:

Smalltalk stack dump:
0xbf8854c4 [] in ByteString(Object)>doesNotUnderstand: 2016701060:
a(n) ByteString
0xbf8854ec [] in SmalltalkImage>snapshot:andQuit:embedded: 2009189024:
a(n) SmalltalkImage
2016700076 s SmalltalkImage>snapshot:andQuit:
2016699984 s TheWorldMenu>saveAndQuit
2016699868 s TheWorldMenu>doMenuItem:with:


it looks like interpreter either eats some object on stack, or
otherwise forgots to push it there, or maybe resuming from wrong
instruction pointer.

Anyways, something goes awfully wrong at the very beginning of
interpret cycle. Which means its having good chances to be catched :)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Eliot Miranda-2
In reply to this post by Igor Stasenko
 
Hi Igor,

On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:

On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>
>
> That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>
>

I tried today with debug info enabled (all source files are compiled with:

compilerFlags

       ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
       -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'

)


please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.

 

sig@sig-VirtualBox:~/vmbuild/build$ ./results/Cog  ./generator.image
ioFindExternalFunctionIn(display_X11, 0x676918):
 ./results/Cog: undefined symbol: display_X11
ioFindExternalFunctionIn(sound_OSS, 0x676918):
 ./results/Cog: undefined symbol: sound_OSS
ioFindExternalFunctionIn(sound_MacOSX, 0x676918):
 ./results/Cog: undefined symbol: sound_MacOSX
ioFindExternalFunctionIn(sound_Sun, 0x676918):
 ./results/Cog: undefined symbol: sound_Sun
ioFindExternalFunctionIn(sound_pulse, 0x676918):
 ./results/Cog: undefined symbol: sound_pulse
ioFindExternalFunctionIn(sound_ALSA, 0x676918):
 ./results/Cog: undefined symbol: sound_ALSA
ioLoadModule(/home/sig/vmbuild/build/results/vm-sound-ALSA):
 /home/sig/vmbuild/build/results/vm-sound-ALSA: undefined symbol:
snd_mixer_selem_has_playback_volume
ioFindExternalFunctionIn(sound_null, 0x676918):
 ./results/Cog: undefined symbol: sound_null
uxAllocateMemory: pageSize 0x1000 (4096), mask 0xfffff000
uxAllocateMemory: /dev/zero descriptor -1
uxAllocateMemory: min heap 39664660, desired 58539092
uxAllocateMemory: mapping 0x40000000 bytes (1024 Mbytes)

(frameNumArgs((page->baseFP))) == numArgs 19611

((((aPage->baseFP)) + (frameStackedReceiverOffset((aPage->baseFP)))) +
(2 * BytesPerWord)) == ((aPage->baseAddress)) 43641

validStackPageBaseFrames() 19637

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

(frameNumArgs(localFP)) == GIV(argumentCount) 4985

!(frameIsBlockActivation(localFP)) 4986

Segmentation fault



Smalltalk stack dump:
0xbfeb4f6c I [] in MessageNotUnderstood>message: 2032401548: a(n)
MessageNotUnderstood
0xbfeb4f9c I [] in UndefinedObject(Object)>doesNotUnderstand:
2004271108: a(n) UndefinedObject
0xbfeb4fc4 I [] in UndefinedObject(Object)>mustBeBooleanIn:
2004271108: a(n) UndefinedObject
0xbfeb4fe8 I UndefinedObject(Object)>mustBeBoolean 2004271108: a(n)
UndefinedObject
0xbfeb5014 I [] in SmalltalkImage>snapshot:andQuit:embedded:
2006539444: a(n) SmalltalkImage
2032008288 s SmalltalkImage>snapshot:andQuit:
2032008176 s WorldState class>saveAndQuit
2032008064 s [] in ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2032007972 s BlockClosure>ensure:
2032007880 s CursorWithMask(Cursor)>showWhile:
2032007676 s ToggleMenuItemMorph(MenuItemMorph)>invokeWithEvent:
2032007584 s ToggleMenuItemMorph(MenuItemMorph)>mouseUp:
2032007492 s ToggleMenuItemMorph(MenuItemMorph)>handleMouseUp:
2032007356 s MouseButtonEvent>sentTo:
2032007264 s ToggleMenuItemMorph(Morph)>handleEvent:
2032007172 s MorphicEventDispatcher>dispatchDefault:with:
2032007080 s MorphicEventDispatcher>dispatchEvent:with:
2032006988 s ToggleMenuItemMorph(Morph)>processEvent:using:
2032006896 s MorphicEventDispatcher>dispatchDefault:with:
2032006788 s MorphicEventDispatcher>dispatchEvent:with:
2032006652 s MenuMorph(Morph)>processEvent:using:
2032006560 s MenuMorph(Morph)>processEvent:
2032006468 s MenuMorph>handleFocusEvent:
2032006376 s [] in HandMorph>sendFocusEvent:to:clear:
2032006284 s [] in PasteUpMorph>becomeActiveDuring:
2032006116 s BlockClosure>on:do:
2032005932 s PasteUpMorph>becomeActiveDuring:
2032005840 s HandMorph>sendFocusEvent:to:clear:
2032005748 s HandMorph>sendEvent:focus:clear:
2032005656 s HandMorph>sendMouseEvent:
2032005564 s HandMorph>handleEvent:
2032005288 s HandMorph>processEvents
2032005196 s [] in WorldState>doOneCycleNowFor:
2032005104 s Array(SequenceableCollection)>do:
2032005012 s WorldState>handsDo:
2032004920 s WorldState>doOneCycleNowFor:
2032004792 s WorldState>doOneCycleFor:
2032004700 s PasteUpMorph>doOneCycle
2006915672 s [] in Project class>?
2006915544 s [] in BlockClosure>?

Most recent primitives
basicNew
Aborted


So, it looks like something are not initialized correctly..

>
>>
>> Any ideas?
>
> If you can get cmake to run with the same flags as configure (!making sure to use -save-temps so we can look at generated assembler and object files!) and you can get one or other to crash on start-up then one can compare the two and hopefully find the elusive bug.
>

I just uploaded a fix for config which can build Cog VM:

CMakeVMMaker-IgorStasenko.24

load it into your image and issue

CogUnixConfig generate

or

CogDebugUnixConfig generate



it will generate cmake files in ../build directory (relative to
image's current dir).
(also you can use  #generateWithSources to generate VMMaker sources
along with build config)


cd to it, and do
cmake . && make

in results subdir you will find the built artifacts.


>>
>> I am really don't like having VM which stability depends on some
>> little flag(s).. and i guess you too.
>
> Damn right!!  Except it /doesn't/ depend on the optimization.  It is more subtle than that.  For example there was one build which Martin Kobetic found would crash on startup if the image path was something like /st/squeak/cog/myimage.image but not if it was /some/network/drive/and/hence/much/longer/myimage.image.
>
> confused, bewildered and disquieted,
> Eliot
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko

On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:

>
> Hi Igor,
>
> On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>> >
>> >
>>
>> I tried today with debug info enabled (all source files are compiled with:
>>
>> compilerFlags
>>
>>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>>
>> )
>>
>
> please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>
>\

ok i made such config:

....results/Cog -version
3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
Croquet Closure Stack VM [StackInterpreter
VMMaker-oscog-IgorStasenko.Stasenko.49]
<FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
plugin path: /home/sig/vmbuild/build/results/ [default:
/home/sig/vmbuild/build/results/]


it also produces a lot of .i and .s files around build dir..

You can try building it by loading CMakeVMMaker-IgorStasenko.27
package and doing:

FixedVerSIDebugUnixConfig generateWithSources

or tell me what to do next :)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Eliot Miranda-2
 


On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <[hidden email]> wrote:

On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:
>
> Hi Igor,
>
> On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>> >
>> >
>>
>> I tried today with debug info enabled (all source files are compiled with:
>>
>> compilerFlags
>>
>>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>>
>> )
>>
>
> please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>
>\

ok i made such config:

....results/Cog -version
3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
Croquet Closure Stack VM [StackInterpreter
VMMaker-oscog-IgorStasenko.Stasenko.49]
<FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
plugin path: /home/sig/vmbuild/build/results/ [default:
/home/sig/vmbuild/build/results/]


it also produces a lot of .i and .s files around build dir..

Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
 

You can try building it by loading CMakeVMMaker-IgorStasenko.27
package and doing:

FixedVerSIDebugUnixConfig generateWithSources

or tell me what to do next :)

You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko

On 10 February 2011 19:38, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:
>> >
>> > Hi Igor,
>> >
>> > On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> >
>> >> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>> >> >
>> >> >
>> >>
>> >> I tried today with debug info enabled (all source files are compiled with:
>> >>
>> >> compilerFlags
>> >>
>> >>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>> >>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>> >>
>> >> )
>> >>
>> >
>> > please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>> >
>> >\
>>
>> ok i made such config:
>>
>> ....results/Cog -version
>> 3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
>> Croquet Closure Stack VM [StackInterpreter
>> VMMaker-oscog-IgorStasenko.Stasenko.49]
>> <FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
>> plugin path: /home/sig/vmbuild/build/results/ [default:
>> /home/sig/vmbuild/build/results/]
>>
>>
>> it also produces a lot of .i and .s files around build dir..
>
> Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
>
>>
>> You can try building it by loading CMakeVMMaker-IgorStasenko.27
>> package and doing:
>>
>> FixedVerSIDebugUnixConfig generateWithSources
>>
>> or tell me what to do next :)
>
> You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.

i can do that , but on two different architectures.
On linux it crashing , no matter what i do (as you can see even stack
based are crashing).

>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Eliot Miranda-2
 


On Thu, Feb 10, 2011 at 11:47 AM, Igor Stasenko <[hidden email]> wrote:

On 10 February 2011 19:38, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:
>> >
>> > Hi Igor,
>> >
>> > On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> >
>> >> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>> >> >
>> >> >
>> >>
>> >> I tried today with debug info enabled (all source files are compiled with:
>> >>
>> >> compilerFlags
>> >>
>> >>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>> >>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>> >>
>> >> )
>> >>
>> >
>> > please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>> >
>> >\
>>
>> ok i made such config:
>>
>> ....results/Cog -version
>> 3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
>> Croquet Closure Stack VM [StackInterpreter
>> VMMaker-oscog-IgorStasenko.Stasenko.49]
>> <FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
>> plugin path: /home/sig/vmbuild/build/results/ [default:
>> /home/sig/vmbuild/build/results/]
>>
>>
>> it also produces a lot of .i and .s files around build dir..
>
> Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
>
>>
>> You can try building it by loading CMakeVMMaker-IgorStasenko.27
>> package and doing:
>>
>> FixedVerSIDebugUnixConfig generateWithSources
>>
>> or tell me what to do next :)
>
> You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.

i can do that , but on two different architectures.

I mean of course two different compilations on the same architecture of the same source, one that crashes and one that doesn't.

 
On linux it crashing , no matter what i do (as you can see even stack
based are crashing).

Sometimes my linux builds work and sometimes they don't, and I see no rhyme or reason why.  Thats what we're trying to work out.  So we need to look at linux, and compare builds that work against those that don't.  So the first requirement is to obtain reproducible builds that work and that don't.


>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko

On 11 February 2011 01:12, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Thu, Feb 10, 2011 at 11:47 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 10 February 2011 19:38, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> > Hi Igor,
>> >> >
>> >> > On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>> >> >> >
>> >> >> >
>> >> >> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> I tried today with debug info enabled (all source files are compiled with:
>> >> >>
>> >> >> compilerFlags
>> >> >>
>> >> >>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>> >> >>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>> >> >>
>> >> >> )
>> >> >>
>> >> >
>> >> > please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>> >> >
>> >> >\
>> >>
>> >> ok i made such config:
>> >>
>> >> ....results/Cog -version
>> >> 3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
>> >> Croquet Closure Stack VM [StackInterpreter
>> >> VMMaker-oscog-IgorStasenko.Stasenko.49]
>> >> <FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
>> >> plugin path: /home/sig/vmbuild/build/results/ [default:
>> >> /home/sig/vmbuild/build/results/]
>> >>
>> >>
>> >> it also produces a lot of .i and .s files around build dir..
>> >
>> > Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
>> >
>> >>
>> >> You can try building it by loading CMakeVMMaker-IgorStasenko.27
>> >> package and doing:
>> >>
>> >> FixedVerSIDebugUnixConfig generateWithSources
>> >>
>> >> or tell me what to do next :)
>> >
>> > You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.
>>
>> i can do that , but on two different architectures.
>
> I mean of course two different compilations on the same architecture of the same source, one that crashes and one that doesn't.
>
>>
>> On linux it crashing , no matter what i do (as you can see even stack
>> based are crashing).
>
> Sometimes my linux builds work and sometimes they don't, and I see no rhyme or reason why.  Thats what we're trying to work out.  So we need to look at linux, and compare builds that work against those that don't.  So the first requirement is to obtain reproducible builds that work and that don't.

well, so far i have 100% reproducible crash. All combinations:
JIT/Stack release/debug :)
I built VM using this config on two different linux system - one is
ubuntu on my virtual box, and another is ?centOS?
on machine which used as a Hudson slave.

i will check tomorrow a build flags for interp.c file.. i think it's
built using same optimization flag(s) - O1 , because
it is set separately.

i will also try to build VM using purely your sources to avoid
possible impact of my changes. But i checked that before , without
much difference.

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko

On 11 February 2011 02:26, Igor Stasenko <[hidden email]> wrote:

> On 11 February 2011 01:12, Eliot Miranda <[hidden email]> wrote:
>>
>>
>>
>> On Thu, Feb 10, 2011 at 11:47 AM, Igor Stasenko <[hidden email]> wrote:
>>>
>>> On 10 February 2011 19:38, Eliot Miranda <[hidden email]> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Feb 10, 2011 at 9:55 AM, Igor Stasenko <[hidden email]> wrote:
>>> >>
>>> >> On 10 February 2011 18:34, Eliot Miranda <[hidden email]> wrote:
>>> >> >
>>> >> > Hi Igor,
>>> >> >
>>> >> > On Thu, Feb 10, 2011 at 1:03 AM, Igor Stasenko <[hidden email]> wrote:
>>> >> >>
>>> >> >> On 9 February 2011 20:04, Eliot Miranda <[hidden email]> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > That's essentially what I see but the variability isn't between cmake and configure but between different runs of configure.  For example, you'll see that I released 2259 (SimpleStackBasedCogit) and 2361 (StackToRegisterMappingCogit) at the weekend.  That's because 2360 which had -O2 for gcc3x-cointerp.c crashed on startup on my test case (Squeak4.2-10856-beta.image) in one of the early performs as classes are sent startUp: on startup.  So I lowered optimization, checked-in 2361, built, checked it didn't crash and released.  However, now I try and rebuild exactly the same sources but using -O2 for gcc3x-cointerp.c I can't get it to crash.  This is exactly analogous to a few weeks back when I was convinced that the optimization level of the heartbeat caused it to crash if at -O2.  When Andreas asked me to reproduce on the internal Teleplace build I couldn't get it to repeat.  So something is very odd indeed, sensitive perhaps to the timestamp in the executable or some such.  However, now at least I know what I'm looking for and the next tie I build somethign that crashes on the test case I will attempt to debug.
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >> I tried today with debug info enabled (all source files are compiled with:
>>> >> >>
>>> >> >> compilerFlags
>>> >> >>
>>> >> >>        ^ '-g3 -O1 -msse2 -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
>>> >> >>        -DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1'
>>> >> >>
>>> >> >> )
>>> >> >>
>>> >> >
>>> >> > please change that to include -save-temps.  We can then see what the generated assembly and object files are and that will really help analyse.  Also, can you somehow freeze this source so that we can repeat the compilation exactly?  i.e. avoid generating a different version.c with a different date in it.  We must try and repeat the compilation exactly with no temporal or path-derived artifacts.
>>> >> >
>>> >> >\
>>> >>
>>> >> ok i made such config:
>>> >>
>>> >> ....results/Cog -version
>>> >> 3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
>>> >> Croquet Closure Stack VM [StackInterpreter
>>> >> VMMaker-oscog-IgorStasenko.Stasenko.49]
>>> >> <FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
>>> >> plugin path: /home/sig/vmbuild/build/results/ [default:
>>> >> /home/sig/vmbuild/build/results/]
>>> >>
>>> >>
>>> >> it also produces a lot of .i and .s files around build dir..
>>> >
>>> > Right.  That's what -save-temps does.  And that's useful data, especially in seeing what code gcc produces for different -O levels.
>>> >
>>> >>
>>> >> You can try building it by loading CMakeVMMaker-IgorStasenko.27
>>> >> package and doing:
>>> >>
>>> >> FixedVerSIDebugUnixConfig generateWithSources
>>> >>
>>> >> or tell me what to do next :)
>>> >
>>> > You need to try and produce one build that crashes and one build that doesn't (based e.g. on -O level).  When you have that compare the two up to the point of failure.
>>>
>>> i can do that , but on two different architectures.
>>
>> I mean of course two different compilations on the same architecture of the same source, one that crashes and one that doesn't.
>>
>>>
>>> On linux it crashing , no matter what i do (as you can see even stack
>>> based are crashing).
>>
>> Sometimes my linux builds work and sometimes they don't, and I see no rhyme or reason why.  Thats what we're trying to work out.  So we need to look at linux, and compare builds that work against those that don't.  So the first requirement is to obtain reproducible builds that work and that don't.
>
> well, so far i have 100% reproducible crash. All combinations:
> JIT/Stack release/debug :)
> I built VM using this config on two different linux system - one is
> ubuntu on my virtual box, and another is ?centOS?
> on machine which used as a Hudson slave.
>
> i will check tomorrow a build flags for interp.c file.. i think it's
> built using same optimization flag(s) - O1 , because
> it is set separately.
>

Ah, no.. the flags were set for cogit.c

set_source_files_properties( ${srcVMDir}/cogit.c PROPERTIES
                COMPILE_FLAGS "-O1 -fno-omit-frame-pointer
-momit-leaf-frame-pointer -mno-rtd -mno-accumulate-outgoing-args")

but apparently if i build stack-based VM, they are not used for gcc3x-interp.c.

Here the command line used to compile it:

/usr/bin/gcc  -D_GNU_SOURCE -DDEBUG -DITIMER_HEARTBEAT=1
-DNO_VM_PROFILE=1 -DCOGMTVM=0 -DDEBUGVM=1
-I/home/sig/cog-blessed/platforms/unix/plugins/B3DAcceleratorPlugin
-I/home/sig/cog-blessed/platforms/Cross/vm
-I/home/sig/cog-blessed/src/vm
-I/home/sig/cog-blessed/platforms/unix/vm
-I/home/sig/cog-blessed/build   -g3 -O1 -msse2 -save-temps -o
CMakeFiles/Cog.dir/home/sig/cog-blessed/src/vm/gcc3x-interp.c.o   -c


> i will also try to build VM using purely your sources to avoid
> possible impact of my changes. But i checked that before , without
> much difference.
>

Confirmed: i built VM using purely your sources, not tainted by my
changes. And this didn't changed anything:

cd build
cmake . && make
cd results

sig@sig-VirtualBox:~/cog-blessed/build/results$ ./Cog -version
3.9-7 #1 <HERE IS SUPPOSED TO BE THE DATE> <HERE IS SUPPOSED TO BE gcc VERSION>
Croquet Closure Stack VM [StackInterpreter VMMaker-oscog.47]
<FAKE FROZEN VERSION FOR DEBUGGING PURPOSES>
plugin path: /home/sig/cog-blessed/build/results/ [default:
/home/sig/cog-blessed/build/results/]

sig@sig-VirtualBox:~/cog-blessed/build/results$ ./Cog
../../image/VMMaker-Squeak4.1.image
.....
Segmentation fault

C stack backtrace:
./Cog(error+0x50)[0x80836b5]
./Cog[0x8083753]
[0x327400]
./Cog[0x8078326]
./Cog(interpret+0x1a)[0x8072ee9]
./Cog(main+0x408)[0x8083624]
/lib/libc.so.6(__libc_start_main+0xe7)[0x82bce7]
./Cog[0x805cca1]


Smalltalk stack dump:
0xbf9fd524 [] in ByteString(Object)>doesNotUnderstand: 2016719100:
a(n) ByteString
0xbf9fd54c [] in SmalltalkImage>snapshot:andQuit:embedded: 2006102020:
a(n) SmalltalkImage
2016951156 s SmalltalkImage>snapshot:andQuit:


As last measure i also tried to use non-gnuified interp.c instead of
gcc3x-interp.c.. Same result.


Smalltalk stack dump:
0xbfe8de34 [] in ByteString(Object)>doesNotUnderstand: 2017259772:
a(n) ByteString
0xbfe8de5c [] in SmalltalkImage>snapshot:andQuit:embedded: 2006642692:
a(n) SmalltalkImage
2017491828 s SmalltalkImage>snapshot:andQuit:
2017491920 s TheWorldMenu>saveAndQuit



So, i have a strong feeling that this is not related to compiler
peculiarities, but some bug in the code.
I will try to debug it a little and see if i can find something.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko
 
Ok, i traced the interpret loop up to the point execution takes
different(wrong) route.

Given the method:


snapshot: save andQuit: quit embedded: embeddedFlag
        "Mark the changes file and close all files as part of #processShutdownList.
        If save is true, save the current state of this Smalltalk in the image file.
        If quit is true, then exit to the outer OS shell.
        The latter part of this method runs when resuming a previously saved
image. This resume logic checks for a document file to process when
starting up."

        | snapshotResult resuming msg |
        Object flushDependents.
        Object flushEvents.
        (SourceFiles at: 2)
                ifNotNil: [
                        msg := String
                                streamContents: [ :s |
                                        s
                                                nextPutAll: '----';
                                                nextPutAll:
                                                                (save
                                                                                ifTrue: [
                                                                                        quit
                                                                                                ifTrue: [ 'QUIT' ]
                                                                                                ifFalse: [ 'SNAPSHOT' ] ]
                                                                                ifFalse: [
                                                                                        quit
                                                                                                ifTrue: [ 'QUIT/NOSAVE' ]
                                                                                                ifFalse: [ 'NOP' ] ]);
                                                nextPutAll: '----';
                                                print: Date dateAndTimeNow;
                                                space;
                                                nextPutAll: (FileDirectory default localNameFor: self imageName);
                                                nextPutAll: ' priorSource: ';
                                                print: LastQuitLogPosition ].
                        self assureStartupStampLogged.
                        save
                                ifTrue: [
                                        LastQuitLogPosition := (SourceFiles at: 2)
                                                setToEnd;
                                                position ].
                        self logChange: msg.
                        Transcript
                                cr;
                                show: msg ].
        self processShutDownList: quit.
        Cursor write show.
        save
                ifTrue: [
                        snapshotResult := embeddedFlag
                                ifTrue: [ self snapshotEmbeddedPrimitive ]
                                ifFalse: [ self snapshotPrimitive ]. "<-- PC frozen here on image file"
                        resuming := snapshotResult == true.
                        snapshotResult == false
                                ifTrue: [
                                        "Time to reclaim segment files is immediately after a save"
                                        Smalltalk globals at: #ImageSegment ifPresent: [ :theClass |
theClass reclaimObsoleteSegmentFiles ] ] "guard against failure" ]
                ifFalse: [ resuming := false ].
        (quit and: [ resuming not ])
                ifTrue: [ self quitPrimitive ].
        Cursor normal show.
        self setGCParameters.
        resuming
                ifTrue: [ Smalltalk clearExternalObjects ].
        self processStartUpList: resuming.
        resuming
                ifTrue: [ self recordStartupStamp ].
        UIManager default onSnapshot: resuming. "Now it's time to raise an error"
        snapshotResult == nil
                ifTrue: [ self error: 'Failed to write image file (disk full?)' ].
               
        ^ resuming
------------

And its bytecode:

-------------

217 <41> pushLit: Object
218 <D0> send: flushDependents
219 <87> pop
220 <41> pushLit: Object
221 <D2> send: flushEvents
222 <87> pop
223 <56> pushLit: SourceFiles
224 <77> pushConstant: 2
225 <C0> send: at:
226 <73> pushConstant: nil
227 <C6> send: ==
228 <A8 5B> jumpTrue: 321
230 <44> pushLit: String
231 <10> pushTemp: 0
232 <11> pushTemp: 1
233 <8F 21 00 32> closureNumCopied: 2 numArgs: 1 bytes 237 to 286
237 <10> pushTemp: 0
238 <88> dup
239 <26> pushConstant: '----'
240 <E5> send: nextPutAll:
241 <87> pop
242 <88> dup
243 <11> pushTemp: 1
244 <9D> jumpFalse: 251
245 <12> pushTemp: 2
246 <99> jumpFalse: 249
247 <2A> pushConstant: 'QUIT'
248 <90> jumpTo: 250
249 <29> pushConstant: 'SNAPSHOT'
250 <94> jumpTo: 256
251 <12> pushTemp: 2
252 <99> jumpFalse: 255
253 <28> pushConstant: 'QUIT/NOSAVE'
254 <90> jumpTo: 256
255 <27> pushConstant: 'NOP'
256 <E5> send: nextPutAll:
257 <87> pop
258 <88> dup
259 <26> pushConstant: '----'
260 <E5> send: nextPutAll:
261 <87> pop
262 <88> dup
263 <4D> pushLit: Date
264 <DC> send: dateAndTimeNow
265 <EB> send: print:
266 <87> pop
267 <88> dup
268 <DE> send: space
269 <87> pop
270 <88> dup
271 <51> pushLit: FileDirectory
272 <83 10> send: default
274 <70> self
275 <83 12> send: imageName
277 <EF> send: localNameFor:
278 <E5> send: nextPutAll:
279 <87> pop
280 <88> dup
281 <33> pushConstant: ' priorSource: '
282 <E5> send: nextPutAll:
283 <87> pop
284 <54> pushLit: LastQuitLogPosition
285 <EB> send: print:
286 <7D> blockReturn
287 <E3> send: streamContents:
288 <6D> popIntoTemp: 5
289 <70> self
290 <83 15> send: assureStartupStampLogged
292 <87> pop
293 <10> pushTemp: 0
294 <AC 0B> jumpFalse: 307
296 <56> pushLit: SourceFiles
297 <77> pushConstant: 2
298 <C0> send: at:
299 <88> dup
300 <83 17> send: setToEnd
302 <87> pop
303 <83 18> send: position
305 <82 D4> popIntoLit: LastQuitLogPosition
307 <70> self
308 <15> pushTemp: 5
309 <83 39> send: logChange:
311 <87> pop
312 <5A> pushLit: Transcript
313 <88> dup
314 <83 1B> send: cr
316 <87> pop
317 <15> pushTemp: 5
318 <83 3C> send: show:
320 <87> pop
321 <70> self
322 <11> pushTemp: 1
323 <83 3D> send: processShutDownList:
325 <87> pop
326 <80 E0> pushLit: Cursor
328 <83 1F> send: write
330 <83 1E> send: show
332 <87> pop
333 <10> pushTemp: 0
334 <AC 26> jumpFalse: 374
336 <12> pushTemp: 2
337 <9B> jumpFalse: 342
338 <70> self
339 <86 22> send: snapshotEmbeddedPrimitive
341 <92> jumpTo: 345
342 <70> self
343 <86 21> send: snapshotPrimitive
345 <6B> popIntoTemp: 3
346 <13> pushTemp: 3
347 <71> pushConstant: true
348 <C6> send: ==
349 <6C> popIntoTemp: 4
350 <13> pushTemp: 3
351 <72> pushConstant: false
352 <C6> send: ==
353 <AC 11> jumpFalse: 372
355 <80 E5> pushLit: Smalltalk
357 <86 24> send: globals
359 <80 A6> pushConstant: #ImageSegment
361 <8F 01 00 04> closureNumCopied: 0 numArgs: 1 bytes 365 to 368
365 <10> pushTemp: 0
366 <86 27> send: reclaimObsoleteSegmentFiles
368 <7D> blockReturn
369 <86 A3> send: at:ifPresent:
371 <90> jumpTo: 373
372 <73> pushConstant: nil
373 <92> jumpTo: 377
374 <72> pushConstant: false
375 <81 44> storeIntoTemp: 4
377 <87> pop
378 <11> pushTemp: 1
379 <9B> jumpFalse: 384
380 <14> pushTemp: 4
381 <86 29> send: not
383 <90> jumpTo: 385
384 <72> pushConstant: false
385 <9B> jumpFalse: 390
386 <70> self
387 <86 28> send: quitPrimitive
389 <87> pop
390 <80 E0> pushLit: Cursor
392 <86 2A> send: normal
394 <83 1E> send: show
396 <87> pop
397 <70> self
398 <86 2B> send: setGCParameters
400 <87> pop
401 <14> pushTemp: 4
402 <9C> jumpFalse: 408
403 <80 E5> pushLit: Smalltalk
405 <86 2C> send: clearExternalObjects
407 <87> pop
408 <70> self
409 <14> pushTemp: 4
410 <86 6D> send: processStartUpList:
412 <87> pop
413 <14> pushTemp: 4
414 <9B> jumpFalse: 419
415 <70> self
416 <86 2E> send: recordStartupStamp
418 <87> pop
419 <80 F0> pushLit: UIManager
421 <83 10> send: default
423 <14> pushTemp: 4
424 <86 6F> send: onSnapshot:
426 <87> pop
427 <13> pushTemp: 3
428 <73> pushConstant: nil
429 <C6> send: ==
430 <9D> jumpFalse: 437
431 <70> self
432 <80 B2> pushConstant: 'Failed to write image file (disk full?)'
434 <86 71> send: error:
436 <87> pop
437 <14> pushTemp: 4
438 <7C> returnTop

-----

it is diverging from simulator at bytecode
378 <11> pushTemp: 1

which is instead of pushing quit = true, pushes nil.
Then of course next bytecode (jumpFalse:) fails and it sends
#mustBeBoolean: to nil object.

Apparently either stack is imbalanced (so it points to wrong stack
frame which has nil at temp=1 offset),
or something overwrites this argument by nil, for some unknown reason.

Now the question is where this nil came from and who wrote it there. :)


Here the stack frame when loading this image in simulator and before
starting any interpretation:


  -16r27C14 SmalltalkImage>snapshot:andQuit:embedded: 2268340: a(n)
SmalltalkImage
  -16r27C00/512:   rcvr/clsr:   16r229CB4 a SmalltalkImage
  -16r27C04/511:         arg:       16r14 true
  -16r27C08/510:         arg:       16r14 true
  -16r27C0C/509:         arg:        16rC false
  -16r27C10/508:cllr ip/ctxt:  16r1A8F5E0=27850208
  -16r27C14/507:    saved fp:        16r0
  -16r27C18/506:      method:   16r918D24 a CompiledMethod
  -16r27C1C/505:       flags:    16r10301=66305  numArgs: 3
hasContext: true  isBlock: false
  -16r27C20/504:     context:  16r1A8F63C=27850300
  -16r27C24/503:    receiver:   16r229CB4 a SmalltalkImage
  -16r27C28/502:   temp/stck:        16r4 nil
  -16r27C2C/501:   temp/stck:        16r4 nil
  -16r27C30/500:   temp/stck:  16r1A8FEF8 '----QUIT----an Array(10
February 2011 7:56:30 pm) generator.image priorSource: 24702567'
  -16r27C34/499:   temp/stck:       16r14 true

And evaluating:

(self temporary: 1 in: localFP) == objectMemory  trueObject

in simulator yields true, which seems correct.

Here the stack at the point of pushing quit flag (temp1):

  -16r27C14 SmalltalkImage>snapshot:andQuit:embedded: 2268340: a(n)
SmalltalkImage
  -16r27C00/512:   rcvr/clsr:   16r229CB4 a SmalltalkImage
  -16r27C04/511:         arg:       16r14 true
  -16r27C08/510:         arg:       16r14 true
  -16r27C0C/509:         arg:        16rC false
  -16r27C10/508:cllr ip/ctxt:  16r1A8F5E0=27850208
  -16r27C14/507:    saved fp:        16r0
  -16r27C18/506:      method:   16r918D24 a CompiledMethod
  -16r27C1C/505:       flags:    16r10301=66305  numArgs: 3
hasContext: true  isBlock: false
  -16r27C20/504:     context:  16r1A8F63C=27850300
  -16r27C24/503:    receiver:   16r229CB4 a SmalltalkImage
  -16r27C28/502:   temp/stck:       16r14 true
  -16r27C2C/501:   temp/stck:       16r14 true
  -16r27C30/500:   temp/stck:  16r1A8FEF8 '----QUIT----an Array(10
February 2011 7:56:30 pm) generator.image priorSource: 24702567'
378 11 pushTemporaryVariableBytecode (12)

and it reads value from
 -16r27C08/510:         arg:       16r14 true
when performing #pushTemporaryVariableBytecode

So, in simulator it looks fine :)


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko
 
Ok it seems like i found the cause of this:

(gdb) print VMBIGENDIAN
$70 = 1

weird.. ;)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

Igor Stasenko
 
On 12 February 2011 21:21, Igor Stasenko <[hidden email]> wrote:
> Ok it seems like i found the cause of this:
>
> (gdb) print VMBIGENDIAN
> $70 = 1
>
> weird.. ;)
>

Okay, i managed to build Stack VM on unix, which 'works'.
Spent too much time on this elusive LSB_FIRST=1 macro undefined in
cmake configs...
Damn... that's why i HATE C.
It is always like that - not the code itself, but some hideous
flag/macro. And of course, no any clue what it for,
and what it does, and how it can be used.


P.S. Yeah, compiling sqUnixHeartBeat.c with -O2 still crashing VM
after few seconds running,
so i took same flags as you put in makefiles.

Not sure about interpret.c. Should i also use -O1? You said that
compiler does something wrong with alloca() calls.
Is there a way to test that by invoking some primitive or do something
else to force to reveal itself?


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: About Cog on linux

ccrraaiigg
 

> Damn... that's why i HATE C.
> It is always like that - not the code itself, but some hideous
> flag/macro. And of course, no any clue what it for,
> and what it does, and how it can be used.

     Yup.


-C

--
Craig Latta
www.netjam.org/resume
+31  06 2757 7177
+ 1 415  287 3547