corruption of PC in context objects or not (?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

corruption of PC in context objects or not (?)

Andrei Chis
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption? 


 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei



Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Eliot Miranda-2
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
Special-case non-single contexts (because of context-to-stack mapping).
Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
 or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Andrei Chis
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.


On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue:
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue:
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue:
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
Special-case non-single contexts (because of context-to-stack mapping).
Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
 or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

    frame #0: 0x00000001100b1004

->  0x1100b1004: inl    $0x4c, %eax

    0x1100b1006: leal   -0x5c(%rip), %eax

    0x1100b100c: pushq  %r8

    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 

Target 0: (GlamorousToolkit) stopped.



Process 29929 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)

    frame #0: 0x00000001100fe7ed

->  0x1100fe7ed: int3   

    0x1100fe7ee: int3   

    0x1100fe7ef: int3   

    0x1100fe7f0: int3   

Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Eliot Miranda-2
 
Hi Andrei,

On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <[hidden email]> wrote:
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.

Well, it does look like a VM bug.  The VM is somehow failing to intercept some access, perhaps in shallow copy.  Weird.  I shall try and reproduce.   Is there anything special about the process you copy using copyTo: ?

(see below)

On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Cool, nothing unusual here.  This should all work perfectly.  Tis a VM bug. However...
 
Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Let me suggest

Context >> copyTo: aContext 
   "Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil:
        [:mySender| copy privSender: (mySender copyTo: aContext)].
    ^ copy 

Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue:
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue:
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue:
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
Special-case non-single contexts (because of context-to-stack mapping).
Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
 or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

    frame #0: 0x00000001100b1004

->  0x1100b1004: inl    $0x4c, %eax

    0x1100b1006: leal   -0x5c(%rip), %eax

    0x1100b100c: pushq  %r8

    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 

Target 0: (GlamorousToolkit) stopped.



Process 29929 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)

    frame #0: 0x00000001100fe7ed

->  0x1100fe7ed: int3   

    0x1100fe7ee: int3   

    0x1100fe7ef: int3   

    0x1100fe7f0: int3   

Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





--
_,,,^..^,,,_
best, Eliot


--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Andrei Chis
 
Hi Eliot,

On 12 Sep 2020, at 01:42, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,

On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <[hidden email]> wrote:
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.

Well, it does look like a VM bug.  The VM is somehow failing to intercept some access, perhaps in shallow copy.  Weird.  I shall try and reproduce.   Is there anything special about the process you copy using copyTo: ?

I don’t think there is something special about that process. It is the process that we start to run tests [1]. The exception happens in the running process and the crash is when copying the stack of that running process.

Checked some previous logs and we get these kinds of crashes on the CI server since at least two years. So it does not look like a new bug (but who knows).


(see below)

On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Cool, nothing unusual here.  This should all work perfectly.  Tis a VM bug. However...
 
Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Let me suggest

Context >> copyTo: aContext 
   "Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil:
        [:mySender| copy privSender: (mySender copyTo: aContext)].
    ^ copy 

Nice!

I also tried the non-recursive implementation of Context>>#copyTo: from Squeak and it also crashes.

Not sure if related but now in the same image as before I got a different crash and printing the stack does not work. But this time the error seems to come from handleStackOverflow

(lldb) call (void *)printCallStack()
invalid frame pointer
invalid frame pointer
invalid frame pointer
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=EXC_I386_GPFLT).
The process has been returned to the state before expression evaluation.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x121e00000)
  * frame #0: 0x0000000100162258 libGlamorousToolkitVMCore.dylib`marryFrameSP + 584
    frame #1: 0x0000000100172982 libGlamorousToolkitVMCore.dylib`handleStackOverflow + 354
    frame #2: 0x000000010016b025 libGlamorousToolkitVMCore.dylib`ceStackOverflow + 149
    frame #3: 0x00000001100005b3
    frame #4: 0x0000000100174d99 libGlamorousToolkitVMCore.dylib`ptEnterInterpreterFromCallback + 73


Cheers,
Andrei

[1] ./GlamorousToolkit.app/Contents/MacOS/GlamorousToolkit  Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'



Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue: 
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue: 
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue: 
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
 Special-case non-single contexts (because of context-to-stack mapping).
 Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
  or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001100b1004
->  0x1100b1004: inl    $0x4c, %eax
    0x1100b1006: leal   -0x5c(%rip), %eax
    0x1100b100c: pushq  %r8
    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 
Target 0: (GlamorousToolkit) stopped.


Process 29929 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
    frame #0: 0x00000001100fe7ed
->  0x1100fe7ed: int3   
    0x1100fe7ee: int3   
    0x1100fe7ef: int3   
    0x1100fe7f0: int3   
Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





-- 
_,,,^..^,,,_
best, Eliot


-- 
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Eliot Miranda-2
 
Hi Andrei,


On Sep 14, 2020, at 7:15 AM, Andrei Chis <[hidden email]> wrote:

Hi Eliot,

On 12 Sep 2020, at 01:42, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,

On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <[hidden email]> wrote:
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.

Well, it does look like a VM bug.  The VM is somehow failing to intercept some access, perhaps in shallow copy.  Weird.  I shall try and reproduce.   Is there anything special about the process you copy using copyTo: ?

I don’t think there is something special about that process. It is the process that we start to run tests [1]. The exception happens in the running process and the crash is when copying the stack of that running process.

Ok, cool.  What I’d like to do is get a copy of your test setup and run it in an assert vm to try and get more information.  AFAICT the vm code is good do the bug is not obvious.  An assert vm may give more information before the crash.  Have you tried running the system on an assert vm yet?

Checked some previous logs and we get these kinds of crashes on the CI server since at least two years. So it does not look like a new bug (but who knows).


(see below)

On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Cool, nothing unusual here.  This should all work perfectly.  Tis a VM bug. However...
 
Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Let me suggest

Context >> copyTo: aContext 
   "Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil:
        [:mySender| copy privSender: (mySender copyTo: aContext)].
    ^ copy 

Nice!

I also tried the non-recursive implementation of Context>>#copyTo: from Squeak and it also crashes.

Not sure if related but now in the same image as before I got a different crash and printing the stack does not work. But this time the error seems to come from handleStackOverflow

(lldb) call (void *)printCallStack()
invalid frame pointer
invalid frame pointer
invalid frame pointer
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=EXC_I386_GPFLT).
The process has been returned to the state before expression evaluation.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x121e00000)
  * frame #0: 0x0000000100162258 libGlamorousToolkitVMCore.dylib`marryFrameSP + 584
    frame #1: 0x0000000100172982 libGlamorousToolkitVMCore.dylib`handleStackOverflow + 354
    frame #2: 0x000000010016b025 libGlamorousToolkitVMCore.dylib`ceStackOverflow + 149
    frame #3: 0x00000001100005b3
    frame #4: 0x0000000100174d99 libGlamorousToolkitVMCore.dylib`ptEnterInterpreterFromCallback + 73


Cheers,
Andrei

[1] ./GlamorousToolkit.app/Contents/MacOS/GlamorousToolkit  Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'



Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue: 
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue: 
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue: 
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
 Special-case non-single contexts (because of context-to-stack mapping).
 Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
  or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001100b1004
->  0x1100b1004: inl    $0x4c, %eax
    0x1100b1006: leal   -0x5c(%rip), %eax
    0x1100b100c: pushq  %r8
    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 
Target 0: (GlamorousToolkit) stopped.


Process 29929 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
    frame #0: 0x00000001100fe7ed
->  0x1100fe7ed: int3   
    0x1100fe7ee: int3   
    0x1100fe7ef: int3   
    0x1100fe7f0: int3   
Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





-- 
_,,,^..^,,,_
best, Eliot


-- 
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Andrei Chis
 
Hi Eliot,

The setup in GT is a bit customised (some changes in the headless vm, some custom plugins, custom rendering) so I first thought it will be impossible to reproduce the bug in a more standard manner. 
However turns out it is possible. If I use the following script after running the tests a few times in lldb I get the crash starting from a plain Pharo 8 image.

$ ./pharo Pharo.image st --quit

$ lldb ./pharo-vm/Pharo.app/Contents/MacOS/Pharo
(lldb) run --headless Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'


I also tried to compile the vm myself on Mac (Catalina 10.15.6). I build a normal and assert for https://github.com/OpenSmalltalk/opensmalltalk-vm and https://github.com/pharo-project/opensmalltalk-vm from the cog branch.
In both cases I get an issue related to pixman 0.34.0 [1] but that’s easy to workaround. For https://github.com/OpenSmalltalk/opensmalltalk-vm I got an extra problem related to Cairo [2] and had to change libpng from libpng16 to libpng12 to get it to work.

With both the normal VMs I could reproduce the bug and got stacks with the Context>copyTo: messages. 

With the assert VMs I only got a crash for now with the assert vm from https://github.com/pharo-project/opensmalltalk-vm. However there is no Context>copyTo: and the memory seems quite corrupted. 
I suspect the crash also appears in https://github.com/OpenSmalltalk/opensmalltalk-vm  but seems that with the assert vm it is much harder to reproduce. Had to run the tests 20 times and got one crash; running the tests once take 20-30 minutes. 


This is from only crash until now with the assert vm. Not sure if they are helpful or not, or actually related to the problem.

validInstructionPointerinFrame(GIV(instructionPointer), GIV(framePointer)) 18471
Pharo was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 73731 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x157800000)
    frame #0: 0x0000000100015837 Pharo`longAtPointerput(ptr="????", val=5513312480) at sqMemoryAccess.h:142:84 [opt]
   139   static inline sqInt intAtPointer(char *ptr) { return (sqInt)(*((int *)ptr)); }
   140   static inline sqInt intAtPointerput(char *ptr, int val) { return (sqInt)(*((int *)ptr)= val); }
   141   static inline sqInt longAtPointer(char *ptr) { return *(sqInt *)ptr; }
-> 142   static inline sqInt longAtPointerput(char *ptr, sqInt val) { return *(sqInt *)ptr= val; }
   143   static inline sqLong long64AtPointer(char *ptr) { return *(sqLong *)ptr; }
   144   static inline sqLong long64AtPointerput(char *ptr, sqLong val) { return *(sqLong *)ptr= val; }
   145   static inline float singleFloatAtPointer(char *ptr) { return *(float *)ptr; }
Target 0: (Pharo) stopped.


(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x157800000)
  * frame #0: 0x0000000100015837 Pharo`longAtPointerput(ptr="????", val=5513312480) at sqMemoryAccess.h:142:84 [opt]
    frame #1: 0x00000001000161cf Pharo`marryFrameSP(theFP=<unavailable>, theSP=0x0000000000000000) at gcc3x-cointerp.c:68120:3 [opt]
    frame #2: 0x000000010001f5ac Pharo`ceContextinstVar(maybeContext=5510359872, slotIndex=0) at gcc3x-cointerp.c:15221:12 [opt]
    frame #3: 0x00000001480017d6
    frame #4: 0x00000001000022be Pharo`interpret at gcc3x-cointerp.c:2755:3 [opt]
    frame #5: 0x00000001000bc244 Pharo`-[sqSqueakMainApplication runSqueak](self=0x0000000101c76dc0, _cmd=<unavailable>) at sqSqueakMainApplication.m:201:2 [opt]
    frame #6: 0x00007fff3326729b Foundation`__NSFirePerformWithOrder + 360
    frame #7: 0x00007fff30ad3335 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
    frame #8: 0x00007fff30ad3267 CoreFoundation`__CFRunLoopDoObservers + 457
    frame #9: 0x00007fff30ad2805 CoreFoundation`__CFRunLoopRun + 874
    frame #10: 0x00007fff30ad1e3e CoreFoundation`CFRunLoopRunSpecific + 462
    frame #11: 0x00007fff2f6feabd HIToolbox`RunCurrentEventLoopInMode + 292
    frame #12: 0x00007fff2f6fe6f4 HIToolbox`ReceiveNextEventCommon + 359
    frame #13: 0x00007fff2f6fe579 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64
    frame #14: 0x00007fff2dd44039 AppKit`_DPSNextEvent + 883
    frame #15: 0x00007fff2dd42880 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1352
    frame #16: 0x00007fff2dd3458e AppKit`-[NSApplication run] + 658
    frame #17: 0x00007fff2dd06396 AppKit`NSApplicationMain + 777
    frame #18: 0x00007fff6ab3ecc9 libdyld.dylib`start + 1


(lldb) call printCallStack()
    0x7ffeefbe3920 M INVALID RECEIVER>(nil) 0x148716b40: a(n) bad class
    0x7ffeefbe3968 M [] in INVALID RECEIVER>(nil) Context(Object)>>doesNotUnderstand: #bounds
 0x194648118: a(n) bad class
    0x7ffeefbe39a8 M INVALID RECEIVER>(nil) 0x1489fcec0: a(n) bad class
    0x7ffeefbe39e8 M INVALID RECEIVER>(nil)  0x1489fcec0: a(n) bad class
    0x7ffeefbe3a30 I INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe3a80 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe3ab8 M INVALID RECEIVER>(nil)  0x148163cd0: a(n) bad class
    0x7ffeefbe3b08 I INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbe3b40 M INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbe3b78 M INVALID RECEIVER>(nil)  0x1481634e0: a(n) bad class
    0x7ffeefbe3bc0 I INVALID RECEIVER>(nil)  0x148716a38: a(n) bad class
    0x7ffeefbe3c10 I INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3c40 M INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3c78 M INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3cc0 M INVALID RECEIVER>(nil)  0x14d0337f0: a(n) bad class
    0x7ffeefbe3d08 M INVALID RECEIVER>(nil)  0x14d033738: a(n) bad class
    0x7ffeefbe3d50 M INVALID RECEIVER>(nil)  0x14d033680: a(n) bad class
    0x7ffeefbe3d98 M INVALID RECEIVER>(nil)  0x1946493f0: a(n) bad class
    0x7ffeefbe3de0 M INVALID RECEIVER>(nil)  0x194649338: a(n) bad class
    0x7ffeefbe3e28 M INVALID RECEIVER>(nil)  0x194649280: a(n) bad class
    0x7ffeefbe3e70 M INVALID RECEIVER>(nil)  0x1946491c8: a(n) bad class
    0x7ffeefbe3eb8 M INVALID RECEIVER>(nil)  0x194649110: a(n) bad class
    0x7ffeefbec768 M INVALID RECEIVER>(nil)  0x194649038: a(n) bad class
    0x7ffeefbec7b0 M INVALID RECEIVER>(nil)  0x194648f60: a(n) bad class
    0x7ffeefbec7f8 M INVALID RECEIVER>(nil)  0x194648e88: a(n) bad class
    0x7ffeefbec840 M INVALID RECEIVER>(nil)  0x194648dd0: a(n) bad class
    0x7ffeefbec888 M INVALID RECEIVER>(nil)  0x194648d18: a(n) bad class
    0x7ffeefbec8d0 M INVALID RECEIVER>(nil)  0x194648c60: a(n) bad class
    0x7ffeefbec918 M INVALID RECEIVER>(nil)  0x194648b88: a(n) bad class
    0x7ffeefbec960 M INVALID RECEIVER>(nil)  0x194648ad0: a(n) bad class
    0x7ffeefbec9a8 M INVALID RECEIVER>(nil)  0x194648a18: a(n) bad class
    0x7ffeefbec9f0 M INVALID RECEIVER>(nil)  0x194648960: a(n) bad class
    0x7ffeefbeca38 M INVALID RECEIVER>(nil)  0x1946488a8: a(n) bad class
    0x7ffeefbeca80 M INVALID RECEIVER>(nil)  0x1946487f0: a(n) bad class
    0x7ffeefbecac8 M INVALID RECEIVER>(nil)  0x194648708: a(n) bad class
    0x7ffeefbecb10 M INVALID RECEIVER>(nil)  0x194648620: a(n) bad class
    0x7ffeefbecb58 M INVALID RECEIVER>(nil)  0x194648508: a(n) bad class
    0x7ffeefbecba0 M INVALID RECEIVER>(nil)  0x194648450: a(n) bad class
    0x7ffeefbecbe8 M INVALID RECEIVER>(nil)  0x1481641a8: a(n) bad class
    0x7ffeefbecc30 M INVALID RECEIVER>(nil)  0x1481640f0: a(n) bad class
    0x7ffeefbecc78 M INVALID RECEIVER>(nil)  0x148164038: a(n) bad class
    0x7ffeefbeccc0 M INVALID RECEIVER>(nil)  0x148163f80: a(n) bad class
    0x7ffeefbecd08 M INVALID RECEIVER>(nil)  0x148163ec8: a(n) bad class
    0x7ffeefbecd50 M INVALID RECEIVER>(nil)  0x148163e10: a(n) bad class
    0x7ffeefbecd98 M INVALID RECEIVER>(nil)  0x148163d28: a(n) bad class
    0x7ffeefbecde0 M INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbece28 M INVALID RECEIVER>(nil)  0x148163b38: a(n) bad class
    0x7ffeefbece70 M INVALID RECEIVER>(nil)  0x148163a80: a(n) bad class
    0x7ffeefbeceb8 M INVALID RECEIVER>(nil)  0x1481639c8: a(n) bad class
    0x7ffeefbe7758 M INVALID RECEIVER>(nil)  0x1481638e0: a(n) bad class
    0x7ffeefbe77a0 M INVALID RECEIVER>(nil)  0x148163808: a(n) bad class
    0x7ffeefbe77e8 M INVALID RECEIVER>(nil)  0x148163750: a(n) bad class
    0x7ffeefbe7830 M INVALID RECEIVER>(nil)  0x148163698: a(n) bad class
    0x7ffeefbe7878 M INVALID RECEIVER>(nil)  0x1481635c0: a(n) bad class
    0x7ffeefbe78c0 M INVALID RECEIVER>(nil)  0x1481634e0: a(n) bad class
    0x7ffeefbe7908 M INVALID RECEIVER>(nil)  0x148163408: a(n) bad class
    0x7ffeefbe7950 M INVALID RECEIVER>(nil)  0x148163350: a(n) bad class
    0x7ffeefbe7998 M INVALID RECEIVER>(nil)  0x148163298: a(n) bad class
    0x7ffeefbe79e0 M INVALID RECEIVER>(nil)  0x148163188: a(n) bad class
    0x7ffeefbe7a28 M INVALID RECEIVER>(nil)  0x148163098: a(n) bad class
    0x7ffeefbe7a70 M INVALID RECEIVER>(nil)  0x148162fa0: a(n) bad class
    0x7ffeefbe7ab8 M INVALID RECEIVER>(nil)  0x148162ec8: a(n) bad class
    0x7ffeefbe7b00 M INVALID RECEIVER>(nil)  0x148162e10: a(n) bad class
    0x7ffeefbe7b48 M INVALID RECEIVER>(nil)  0x148712a08: a(n) bad class
    0x7ffeefbe7b90 M INVALID RECEIVER>(nil)  0x148712950: a(n) bad class
    0x7ffeefbe7bd8 M INVALID RECEIVER>(nil)  0x148712898: a(n) bad class
    0x7ffeefbe7c20 M INVALID RECEIVER>(nil)  0x148713cc0: a(n) bad class
    0x7ffeefbe7c68 M INVALID RECEIVER>(nil)  0x148713018: a(n) bad class
    0x7ffeefbe7cb0 M INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbe7cf8 M INVALID RECEIVER>(nil)  0x148713140: a(n) bad class
    0x7ffeefbe7d40 M INVALID RECEIVER>(nil)  0x148713928: a(n) bad class
    0x7ffeefbe7d88 M INVALID RECEIVER>(nil)  0x1487133c8: a(n) bad class
    0x7ffeefbe7de0 I INVALID RECEIVER>(nil)  0x148713238: a(n) bad class
    0x7ffeefbe7e28 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbe7e70 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbe7eb8 I INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeea68 I INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeeac8 M [] in INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeeb00 M INVALID RECEIVER>(nil)  0x148713108: a(n) bad class
    0x7ffeefbeeb50 I INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbeeb98 I INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbeebe0 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbeec10 M INVALID RECEIVER>(nil) 0x9=1
    0x7ffeefbeec48 M INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeeca0 M [] in INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeecd0 M INVALID RECEIVER>(nil)  0x1487130d0: a(n) bad class
    0x7ffeefbeed10 M INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeed58 M INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeedb0 I INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeedf0 I INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeee20 M [] in INVALID RECEIVER>(nil)  0x148162ce8: a(n) bad class
    0x7ffeefbeee78 M INVALID RECEIVER>(nil)  0x148162df0: a(n) bad class
    0x7ffeefbeeec0 M INVALID RECEIVER>(nil)  0x148162ce8: a(n) bad class
    0x7ffeefbe5978 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe59d8 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5a18 M INVALID RECEIVER>(nil)  0x148163150: a(n) bad class
    0x7ffeefbe5a68 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5aa0 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5ad8 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5b08 M INVALID RECEIVER>(nil)  0x1481634c0: a(n) bad class
    0x7ffeefbe5b48 M INVALID RECEIVER>(nil)  0x14c403ca8: a(n) bad class
    0x7ffeefbe5b88 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5bc0 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5bf0 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c20 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c68 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c98 M INVALID RECEIVER>(nil)  0x194656468: a(n) bad class
    0x7ffeefbe5cd0 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5d00 M INVALID RECEIVER>(nil)  0x148163bf0: a(n) bad class
    0x7ffeefbe5d50 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5d88 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbe5dc0 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbe5e00 M INVALID RECEIVER>(nil)  0x148163de0: a(n) bad class
    0x7ffeefbe5e38 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5e80 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5eb8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdf9d8 M INVALID RECEIVER>(nil)  0x194648430: a(n) bad class
    0x7ffeefbdfa10 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfa50 M [] in INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfa90 M INVALID RECEIVER>(nil)  0x1946486d8: a(n) bad class
    0x7ffeefbdfad0 M INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfb10 M INVALID RECEIVER>(nil)  0x1946485e0: a(n) bad class
    0x7ffeefbdfb48 M INVALID RECEIVER>(nil)  0x1489f7da8: a(n) bad class
    0x7ffeefbdfb80 M INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfbb8 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfbe8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfc20 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbdfc58 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbdfc98 M INVALID RECEIVER>(nil)  0x194648c40: a(n) bad class
    0x7ffeefbdfcc8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfd08 M INVALID RECEIVER>(nil)  0x194648f40: a(n) bad class
    0x7ffeefbdfd40 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfd70 M INVALID RECEIVER>(nil)  0x14902a730: a(n) bad class
    0x7ffeefbdfdb0 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfde8 M INVALID RECEIVER>(nil)  0x14c4033c0: a(n) bad class
    0x7ffeefbdfe20 M [] in INVALID RECEIVER>(nil)  0x14c4033c0: a(n) bad class
    0x7ffeefbdfe70 M [] in INVALID RECEIVER>(nil)  0x14d032f98: a(n) bad class
    0x7ffeefbdfeb8 M INVALID RECEIVER>(nil)  0x14d032fe0: a(n) bad class

(callerContextOrNil == (nilObject())) || (isContext(callerContextOrNil)) 72783
       0x14d033738 is not a context


A more realistic setup would be to run GT with an assert headless vm. But until now I did not figure out how to build an assert vm for the gt-headless branch from https://github.com/feenkcom/opensmalltalk-vm

Cheers,
Andrei



[2] checking for cairo's PNG functions feature... 
configure: WARNING: Could not find libpng in the pkg-config search path
checking whether cairo's PNG functions feature could be enabled... no
configure: error: recommended PNG functions feature could not be enabled

On 14 Sep 2020, at 17:32, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,


On Sep 14, 2020, at 7:15 AM, Andrei Chis <[hidden email]> wrote:

Hi Eliot,

On 12 Sep 2020, at 01:42, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,

On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <[hidden email]> wrote:
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.

Well, it does look like a VM bug.  The VM is somehow failing to intercept some access, perhaps in shallow copy.  Weird.  I shall try and reproduce.   Is there anything special about the process you copy using copyTo: ?

I don’t think there is something special about that process. It is the process that we start to run tests [1]. The exception happens in the running process and the crash is when copying the stack of that running process.

Ok, cool.  What I’d like to do is get a copy of your test setup and run it in an assert vm to try and get more information.  AFAICT the vm code is good do the bug is not obvious.  An assert vm may give more information before the crash.  Have you tried running the system on an assert vm yet?

Checked some previous logs and we get these kinds of crashes on the CI server since at least two years. So it does not look like a new bug (but who knows).


(see below)

On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Cool, nothing unusual here.  This should all work perfectly.  Tis a VM bug. However...
 
Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Let me suggest

Context >> copyTo: aContext 
   "Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil:
        [:mySender| copy privSender: (mySender copyTo: aContext)].
    ^ copy 

Nice!

I also tried the non-recursive implementation of Context>>#copyTo: from Squeak and it also crashes.

Not sure if related but now in the same image as before I got a different crash and printing the stack does not work. But this time the error seems to come from handleStackOverflow

(lldb) call (void *)printCallStack()
invalid frame pointer
invalid frame pointer
invalid frame pointer
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=EXC_I386_GPFLT).
The process has been returned to the state before expression evaluation.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x121e00000)
  * frame #0: 0x0000000100162258 libGlamorousToolkitVMCore.dylib`marryFrameSP + 584
    frame #1: 0x0000000100172982 libGlamorousToolkitVMCore.dylib`handleStackOverflow + 354
    frame #2: 0x000000010016b025 libGlamorousToolkitVMCore.dylib`ceStackOverflow + 149
    frame #3: 0x00000001100005b3
    frame #4: 0x0000000100174d99 libGlamorousToolkitVMCore.dylib`ptEnterInterpreterFromCallback + 73


Cheers,
Andrei

[1] ./GlamorousToolkit.app/Contents/MacOS/GlamorousToolkit  Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'



Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue: 
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue: 
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue: 
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
 Special-case non-single contexts (because of context-to-stack mapping).
 Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
  or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001100b1004
->  0x1100b1004: inl    $0x4c, %eax
    0x1100b1006: leal   -0x5c(%rip), %eax
    0x1100b100c: pushq  %r8
    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 
Target 0: (GlamorousToolkit) stopped.


Process 29929 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
    frame #0: 0x00000001100fe7ed
->  0x1100fe7ed: int3   
    0x1100fe7ee: int3   
    0x1100fe7ef: int3   
    0x1100fe7f0: int3   
Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





-- 
_,,,^..^,,,_
best, Eliot


-- 
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: corruption of PC in context objects or not (?)

Eliot Miranda-2
 
Hi Andrei,


On Sep 14, 2020, at 3:22 PM, Andrei Chis <[hidden email]> wrote:

Hi Eliot,

The setup in GT is a bit customised (some changes in the headless vm, some custom plugins, custom rendering) so I first thought it will be impossible to reproduce the bug in a more standard manner. 
However turns out it is possible. If I use the following script after running the tests a few times in lldb I get the crash starting from a plain Pharo 8 image.

$ ./pharo Pharo.image st --quit

$ lldb ./pharo-vm/Pharo.app/Contents/MacOS/Pharo
(lldb) run --headless Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'


I also tried to compile the vm myself on Mac (Catalina 10.15.6). I build a normal and assert for https://github.com/OpenSmalltalk/opensmalltalk-vm and https://github.com/pharo-project/opensmalltalk-vm from the cog branch.
In both cases I get an issue related to pixman 0.34.0 [1] but that’s easy to workaround. For https://github.com/OpenSmalltalk/opensmalltalk-vm I got an extra problem related to Cairo [2] and had to change libpng from libpng16 to libpng12 to get it to work.

With both the normal VMs I could reproduce the bug and got stacks with the Context>copyTo: messages. 

With the assert VMs I only got a crash for now with the assert vm from https://github.com/pharo-project/opensmalltalk-vm. However there is no Context>copyTo: and the memory seems quite corrupted. 
I suspect the crash also appears in https://github.com/OpenSmalltalk/opensmalltalk-vm  but seems that with the assert vm it is much harder to reproduce. Had to run the tests 20 times and got one crash; running the tests once take 20-30 minutes. 


This is from only crash until now with the assert vm. Not sure if they are helpful or not, or actually related to the problem.

validInstructionPointerinFrame(GIV(instructionPointer), GIV(framePointer)) 18471
Pharo was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 73731 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x157800000)
    frame #0: 0x0000000100015837 Pharo`longAtPointerput(ptr="????", val=5513312480) at sqMemoryAccess.h:142:84 [opt]
   139   static inline sqInt intAtPointer(char *ptr) { return (sqInt)(*((int *)ptr)); }
   140   static inline sqInt intAtPointerput(char *ptr, int val) { return (sqInt)(*((int *)ptr)= val); }
   141   static inline sqInt longAtPointer(char *ptr) { return *(sqInt *)ptr; }
-> 142   static inline sqInt longAtPointerput(char *ptr, sqInt val) { return *(sqInt *)ptr= val; }
   143   static inline sqLong long64AtPointer(char *ptr) { return *(sqLong *)ptr; }
   144   static inline sqLong long64AtPointerput(char *ptr, sqLong val) { return *(sqLong *)ptr= val; }
   145   static inline float singleFloatAtPointer(char *ptr) { return *(float *)ptr; }
Target 0: (Pharo) stopped.


(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x157800000)
  * frame #0: 0x0000000100015837 Pharo`longAtPointerput(ptr="????", val=5513312480) at sqMemoryAccess.h:142:84 [opt]
    frame #1: 0x00000001000161cf Pharo`marryFrameSP(theFP=<unavailable>, theSP=0x0000000000000000) at gcc3x-cointerp.c:68120:3 [opt]
    frame #2: 0x000000010001f5ac Pharo`ceContextinstVar(maybeContext=5510359872, slotIndex=0) at gcc3x-cointerp.c:15221:12 [opt]
    frame #3: 0x00000001480017d6
    frame #4: 0x00000001000022be Pharo`interpret at gcc3x-cointerp.c:2755:3 [opt]
    frame #5: 0x00000001000bc244 Pharo`-[sqSqueakMainApplication runSqueak](self=0x0000000101c76dc0, _cmd=<unavailable>) at sqSqueakMainApplication.m:201:2 [opt]
    frame #6: 0x00007fff3326729b Foundation`__NSFirePerformWithOrder + 360
    frame #7: 0x00007fff30ad3335 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
    frame #8: 0x00007fff30ad3267 CoreFoundation`__CFRunLoopDoObservers + 457
    frame #9: 0x00007fff30ad2805 CoreFoundation`__CFRunLoopRun + 874
    frame #10: 0x00007fff30ad1e3e CoreFoundation`CFRunLoopRunSpecific + 462
    frame #11: 0x00007fff2f6feabd HIToolbox`RunCurrentEventLoopInMode + 292
    frame #12: 0x00007fff2f6fe6f4 HIToolbox`ReceiveNextEventCommon + 359
    frame #13: 0x00007fff2f6fe579 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64
    frame #14: 0x00007fff2dd44039 AppKit`_DPSNextEvent + 883
    frame #15: 0x00007fff2dd42880 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1352
    frame #16: 0x00007fff2dd3458e AppKit`-[NSApplication run] + 658
    frame #17: 0x00007fff2dd06396 AppKit`NSApplicationMain + 777
    frame #18: 0x00007fff6ab3ecc9 libdyld.dylib`start + 1


(lldb) call printCallStack()
    0x7ffeefbe3920 M INVALID RECEIVER>(nil) 0x148716b40: a(n) bad class
    0x7ffeefbe3968 M [] in INVALID RECEIVER>(nil) Context(Object)>>doesNotUnderstand: #bounds
 0x194648118: a(n) bad class
    0x7ffeefbe39a8 M INVALID RECEIVER>(nil) 0x1489fcec0: a(n) bad class
    0x7ffeefbe39e8 M INVALID RECEIVER>(nil)  0x1489fcec0: a(n) bad class
    0x7ffeefbe3a30 I INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe3a80 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe3ab8 M INVALID RECEIVER>(nil)  0x148163cd0: a(n) bad class
    0x7ffeefbe3b08 I INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbe3b40 M INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbe3b78 M INVALID RECEIVER>(nil)  0x1481634e0: a(n) bad class
    0x7ffeefbe3bc0 I INVALID RECEIVER>(nil)  0x148716a38: a(n) bad class
    0x7ffeefbe3c10 I INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3c40 M INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3c78 M INVALID RECEIVER>(nil)  0x14d0338e8: a(n) bad class
    0x7ffeefbe3cc0 M INVALID RECEIVER>(nil)  0x14d0337f0: a(n) bad class
    0x7ffeefbe3d08 M INVALID RECEIVER>(nil)  0x14d033738: a(n) bad class
    0x7ffeefbe3d50 M INVALID RECEIVER>(nil)  0x14d033680: a(n) bad class
    0x7ffeefbe3d98 M INVALID RECEIVER>(nil)  0x1946493f0: a(n) bad class
    0x7ffeefbe3de0 M INVALID RECEIVER>(nil)  0x194649338: a(n) bad class
    0x7ffeefbe3e28 M INVALID RECEIVER>(nil)  0x194649280: a(n) bad class
    0x7ffeefbe3e70 M INVALID RECEIVER>(nil)  0x1946491c8: a(n) bad class
    0x7ffeefbe3eb8 M INVALID RECEIVER>(nil)  0x194649110: a(n) bad class
    0x7ffeefbec768 M INVALID RECEIVER>(nil)  0x194649038: a(n) bad class
    0x7ffeefbec7b0 M INVALID RECEIVER>(nil)  0x194648f60: a(n) bad class
    0x7ffeefbec7f8 M INVALID RECEIVER>(nil)  0x194648e88: a(n) bad class
    0x7ffeefbec840 M INVALID RECEIVER>(nil)  0x194648dd0: a(n) bad class
    0x7ffeefbec888 M INVALID RECEIVER>(nil)  0x194648d18: a(n) bad class
    0x7ffeefbec8d0 M INVALID RECEIVER>(nil)  0x194648c60: a(n) bad class
    0x7ffeefbec918 M INVALID RECEIVER>(nil)  0x194648b88: a(n) bad class
    0x7ffeefbec960 M INVALID RECEIVER>(nil)  0x194648ad0: a(n) bad class
    0x7ffeefbec9a8 M INVALID RECEIVER>(nil)  0x194648a18: a(n) bad class
    0x7ffeefbec9f0 M INVALID RECEIVER>(nil)  0x194648960: a(n) bad class
    0x7ffeefbeca38 M INVALID RECEIVER>(nil)  0x1946488a8: a(n) bad class
    0x7ffeefbeca80 M INVALID RECEIVER>(nil)  0x1946487f0: a(n) bad class
    0x7ffeefbecac8 M INVALID RECEIVER>(nil)  0x194648708: a(n) bad class
    0x7ffeefbecb10 M INVALID RECEIVER>(nil)  0x194648620: a(n) bad class
    0x7ffeefbecb58 M INVALID RECEIVER>(nil)  0x194648508: a(n) bad class
    0x7ffeefbecba0 M INVALID RECEIVER>(nil)  0x194648450: a(n) bad class
    0x7ffeefbecbe8 M INVALID RECEIVER>(nil)  0x1481641a8: a(n) bad class
    0x7ffeefbecc30 M INVALID RECEIVER>(nil)  0x1481640f0: a(n) bad class
    0x7ffeefbecc78 M INVALID RECEIVER>(nil)  0x148164038: a(n) bad class
    0x7ffeefbeccc0 M INVALID RECEIVER>(nil)  0x148163f80: a(n) bad class
    0x7ffeefbecd08 M INVALID RECEIVER>(nil)  0x148163ec8: a(n) bad class
    0x7ffeefbecd50 M INVALID RECEIVER>(nil)  0x148163e10: a(n) bad class
    0x7ffeefbecd98 M INVALID RECEIVER>(nil)  0x148163d28: a(n) bad class
    0x7ffeefbecde0 M INVALID RECEIVER>(nil)  0x148163c18: a(n) bad class
    0x7ffeefbece28 M INVALID RECEIVER>(nil)  0x148163b38: a(n) bad class
    0x7ffeefbece70 M INVALID RECEIVER>(nil)  0x148163a80: a(n) bad class
    0x7ffeefbeceb8 M INVALID RECEIVER>(nil)  0x1481639c8: a(n) bad class
    0x7ffeefbe7758 M INVALID RECEIVER>(nil)  0x1481638e0: a(n) bad class
    0x7ffeefbe77a0 M INVALID RECEIVER>(nil)  0x148163808: a(n) bad class
    0x7ffeefbe77e8 M INVALID RECEIVER>(nil)  0x148163750: a(n) bad class
    0x7ffeefbe7830 M INVALID RECEIVER>(nil)  0x148163698: a(n) bad class
    0x7ffeefbe7878 M INVALID RECEIVER>(nil)  0x1481635c0: a(n) bad class
    0x7ffeefbe78c0 M INVALID RECEIVER>(nil)  0x1481634e0: a(n) bad class
    0x7ffeefbe7908 M INVALID RECEIVER>(nil)  0x148163408: a(n) bad class
    0x7ffeefbe7950 M INVALID RECEIVER>(nil)  0x148163350: a(n) bad class
    0x7ffeefbe7998 M INVALID RECEIVER>(nil)  0x148163298: a(n) bad class
    0x7ffeefbe79e0 M INVALID RECEIVER>(nil)  0x148163188: a(n) bad class
    0x7ffeefbe7a28 M INVALID RECEIVER>(nil)  0x148163098: a(n) bad class
    0x7ffeefbe7a70 M INVALID RECEIVER>(nil)  0x148162fa0: a(n) bad class
    0x7ffeefbe7ab8 M INVALID RECEIVER>(nil)  0x148162ec8: a(n) bad class
    0x7ffeefbe7b00 M INVALID RECEIVER>(nil)  0x148162e10: a(n) bad class
    0x7ffeefbe7b48 M INVALID RECEIVER>(nil)  0x148712a08: a(n) bad class
    0x7ffeefbe7b90 M INVALID RECEIVER>(nil)  0x148712950: a(n) bad class
    0x7ffeefbe7bd8 M INVALID RECEIVER>(nil)  0x148712898: a(n) bad class
    0x7ffeefbe7c20 M INVALID RECEIVER>(nil)  0x148713cc0: a(n) bad class
    0x7ffeefbe7c68 M INVALID RECEIVER>(nil)  0x148713018: a(n) bad class
    0x7ffeefbe7cb0 M INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbe7cf8 M INVALID RECEIVER>(nil)  0x148713140: a(n) bad class
    0x7ffeefbe7d40 M INVALID RECEIVER>(nil)  0x148713928: a(n) bad class
    0x7ffeefbe7d88 M INVALID RECEIVER>(nil)  0x1487133c8: a(n) bad class
    0x7ffeefbe7de0 I INVALID RECEIVER>(nil)  0x148713238: a(n) bad class
    0x7ffeefbe7e28 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbe7e70 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbe7eb8 I INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeea68 I INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeeac8 M [] in INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeeb00 M INVALID RECEIVER>(nil)  0x148713108: a(n) bad class
    0x7ffeefbeeb50 I INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbeeb98 I INVALID RECEIVER>(nil)  0x148713480: a(n) bad class
    0x7ffeefbeebe0 I INVALID RECEIVER>(nil)  0x1487131f8: a(n) bad class
    0x7ffeefbeec10 M INVALID RECEIVER>(nil) 0x9=1
    0x7ffeefbeec48 M INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeeca0 M [] in INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeecd0 M INVALID RECEIVER>(nil)  0x1487130d0: a(n) bad class
    0x7ffeefbeed10 M INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeed58 M INVALID RECEIVER>(nil)  0x1487123d8: a(n) bad class
    0x7ffeefbeedb0 I INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeedf0 I INVALID RECEIVER>(nil)  0x1487123c8: a(n) bad class
    0x7ffeefbeee20 M [] in INVALID RECEIVER>(nil)  0x148162ce8: a(n) bad class
    0x7ffeefbeee78 M INVALID RECEIVER>(nil)  0x148162df0: a(n) bad class
    0x7ffeefbeeec0 M INVALID RECEIVER>(nil)  0x148162ce8: a(n) bad class
    0x7ffeefbe5978 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe59d8 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5a18 M INVALID RECEIVER>(nil)  0x148163150: a(n) bad class
    0x7ffeefbe5a68 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5aa0 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5ad8 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5b08 M INVALID RECEIVER>(nil)  0x1481634c0: a(n) bad class
    0x7ffeefbe5b48 M INVALID RECEIVER>(nil)  0x14c403ca8: a(n) bad class
    0x7ffeefbe5b88 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5bc0 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5bf0 M [] in INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c20 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c68 M INVALID RECEIVER>(nil)  0x148162f80: a(n) bad class
    0x7ffeefbe5c98 M INVALID RECEIVER>(nil)  0x194656468: a(n) bad class
    0x7ffeefbe5cd0 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5d00 M INVALID RECEIVER>(nil)  0x148163bf0: a(n) bad class
    0x7ffeefbe5d50 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5d88 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbe5dc0 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbe5e00 M INVALID RECEIVER>(nil)  0x148163de0: a(n) bad class
    0x7ffeefbe5e38 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5e80 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbe5eb8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdf9d8 M INVALID RECEIVER>(nil)  0x194648430: a(n) bad class
    0x7ffeefbdfa10 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfa50 M [] in INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfa90 M INVALID RECEIVER>(nil)  0x1946486d8: a(n) bad class
    0x7ffeefbdfad0 M INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfb10 M INVALID RECEIVER>(nil)  0x1946485e0: a(n) bad class
    0x7ffeefbdfb48 M INVALID RECEIVER>(nil)  0x1489f7da8: a(n) bad class
    0x7ffeefbdfb80 M INVALID RECEIVER>(nil)  0x148a02930: a(n) bad class
    0x7ffeefbdfbb8 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfbe8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfc20 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbdfc58 M INVALID RECEIVER>(nil)  0x1489fcff0: a(n) bad class
    0x7ffeefbdfc98 M INVALID RECEIVER>(nil)  0x194648c40: a(n) bad class
    0x7ffeefbdfcc8 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfd08 M INVALID RECEIVER>(nil)  0x194648f40: a(n) bad class
    0x7ffeefbdfd40 M [] in INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfd70 M INVALID RECEIVER>(nil)  0x14902a730: a(n) bad class
    0x7ffeefbdfdb0 M INVALID RECEIVER>(nil)  0x194648118: a(n) bad class
    0x7ffeefbdfde8 M INVALID RECEIVER>(nil)  0x14c4033c0: a(n) bad class
    0x7ffeefbdfe20 M [] in INVALID RECEIVER>(nil)  0x14c4033c0: a(n) bad class
    0x7ffeefbdfe70 M [] in INVALID RECEIVER>(nil)  0x14d032f98: a(n) bad class
    0x7ffeefbdfeb8 M INVALID RECEIVER>(nil)  0x14d032fe0: a(n) bad class

(callerContextOrNil == (nilObject())) || (isContext(callerContextOrNil)) 72783
       0x14d033738 is not a context

OK, interesting.  Both the assert failure and the badly corrupted stack trace lead me to believe that the issue happens long before the crash and is probably a stack corruption, either by a primitive cutting back the stack incorrectly, or some other hot riot ion (for example are all those nils in INVALID RECEIVER>(nil) real or an artifact of attempting to print an invalid value?).

So the next step is to run the asset vm with leak checking turned on.  Use

myvm —leakcheck 3 to check after every GC

We can add, eg leak checking after an FFI call, in an afternoon

A more realistic setup would be to run GT with an assert headless vm. But until now I did not figure out how to build an assert vm for the gt-headless branch from https://github.com/feenkcom/opensmalltalk-vm

Cheers,
Andrei



[2] checking for cairo's PNG functions feature... 
configure: WARNING: Could not find libpng in the pkg-config search path
checking whether cairo's PNG functions feature could be enabled... no
configure: error: recommended PNG functions feature could not be enabled

On 14 Sep 2020, at 17:32, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,


On Sep 14, 2020, at 7:15 AM, Andrei Chis <[hidden email]> wrote:

Hi Eliot,

On 12 Sep 2020, at 01:42, Eliot Miranda <[hidden email]> wrote:

Hi Andrei,

On Fri, Sep 11, 2020 at 11:48 AM Andrei Chis <[hidden email]> wrote:
 
Hi Eliot,

Thanks for the answer. That helps to understand what is going on and it can explain why just adding a call to `self pc` makes the crash disappear. 

Just what was maybe not obvious in my previous email is that we get this problem more or less randomly. We have tests for verifying that tools work when various extensions raise exceptions (these tests copy the stack). Sometimes they work correctly and sometimes they crash. These crashes happen in various tests and until now the only common thing we noticed is that the pc of the contexts where the crash happens looks off. Also the contexts in which this happens are at the beginning of the stack so part of a long computation (it gets copied multiple times).

Initially we suspected that there is some memory corruption somewhere due to external calls/memory. Just the fact that calling `self pc` before seems to fix the issue reduces those chances. But who knows.

Well, it does look like a VM bug.  The VM is somehow failing to intercept some access, perhaps in shallow copy.  Weird.  I shall try and reproduce.   Is there anything special about the process you copy using copyTo: ?

I don’t think there is something special about that process. It is the process that we start to run tests [1]. The exception happens in the running process and the crash is when copying the stack of that running process.

Ok, cool.  What I’d like to do is get a copy of your test setup and run it in an assert vm to try and get more information.  AFAICT the vm code is good do the bug is not obvious.  An assert vm may give more information before the crash.  Have you tried running the system on an assert vm yet?

Checked some previous logs and we get these kinds of crashes on the CI server since at least two years. So it does not look like a new bug (but who knows).


(see below)

On Fri, Sep 11, 2020 at 6:36 PM Eliot Miranda <[hidden email]> wrote:
 
Hi Andrei,

On Fri, Sep 11, 2020 at 8:58 AM Andrei Chis <[hidden email]> wrote:
 
Hi,

We are getting often crashes on our CI when calling `Context>copyTo:` in a GT image and a vm build from https://github.com/feenkcom/opensmalltalk-vm.

To sum up during `Context>copyTo:`, `Object>>#copy` is called on a context leading to a segmentation fault crash. Looking at that context in lldb the pc looks off.  It has the value `0xfffffffffea7f6e1`.

 (lldb) call (void *) printOop(0x1206b6990)
    0x1206b6990: a(n) Context
     0x1206b6a48 0xfffffffffea7f6e1                0x9        0x1146b2e08        0x1206b6b00 
     0x1206b6b28        0x1206b6b50 

Can this indicate some corruption or is it expected to have such values? `CoInterpreter>>ensureContextHasBytecodePC:` has code that also handles negative values for the pc which suggests that this might be expected.

The issue is that that value is expected *inside* the VM.  It is the frame pointer for the context.  But above the Vm this value should be hidden. The VM should intercept all accesses to such fields in contexts and automatically map them back to the appropriate values that the image expects to see.  [The same thing is true for CompiledMethods; inside the VM methods may refer to their JITted code, but this is invisible from the image].  Intercepting access to Context state already happens with inst var access in methods, with the shallowCopy primitive, with instVarAt: et al, etc.

So I expect the issue here is that copyTo: invokes some primitive which does not (yet) check for a context receiver and/or argument, and hence accidentally it reveals the hidden state to the image and a crash results.  What I need to know are the definitions for copyTo: and copy, etc all the way down to primitives.

Here is the source code:

Cool, nothing unusual here.  This should all work perfectly.  Tis a VM bug. However...
 
Context >> copyTo: aContext 
"Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil: [
        copy privSender: (self sender copyTo: aContext)].
    ^ copy

Let me suggest

Context >> copyTo: aContext 
   "Copy self and my sender chain down to, but not including, aContext.  End of copied chain will have nil sender."
    | copy |
    self == aContext ifTrue: [^ nil].
    copy := self copy.
    self sender ifNotNil:
        [:mySender| copy privSender: (mySender copyTo: aContext)].
    ^ copy 

Nice!

I also tried the non-recursive implementation of Context>>#copyTo: from Squeak and it also crashes.

Not sure if related but now in the same image as before I got a different crash and printing the stack does not work. But this time the error seems to come from handleStackOverflow

(lldb) call (void *)printCallStack()
invalid frame pointer
invalid frame pointer
invalid frame pointer
error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=EXC_I386_GPFLT).
The process has been returned to the state before expression evaluation.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x121e00000)
  * frame #0: 0x0000000100162258 libGlamorousToolkitVMCore.dylib`marryFrameSP + 584
    frame #1: 0x0000000100172982 libGlamorousToolkitVMCore.dylib`handleStackOverflow + 354
    frame #2: 0x000000010016b025 libGlamorousToolkitVMCore.dylib`ceStackOverflow + 149
    frame #3: 0x00000001100005b3
    frame #4: 0x0000000100174d99 libGlamorousToolkitVMCore.dylib`ptEnterInterpreterFromCallback + 73


Cheers,
Andrei

[1] ./GlamorousToolkit.app/Contents/MacOS/GlamorousToolkit  Pharo.image examples --junit-xml-output 'GToolkit-.*' 'GT4SmaCC-.*' 'DeepTraverser-.*' Brick 'Brick-.*' Bloc 'Bloc-.*' 'Sparta-.*'



Object>>#copy
     ^self shallowCopy postCopy

Object >> shallowCopy
    | class newObject index |
    <primitive: 148>
    class := self class.
    class isVariable
        ifTrue: 
            [index := self basicSize.
            newObject := class basicNew: index.
            [index > 0]
                whileTrue: 
                    [newObject basicAt: index put: (self basicAt: index).
                    index := index - 1]]
        ifFalse: [newObject := class basicNew].
    index := class instSize.
    [index > 0]
        whileTrue: 
            [newObject instVarAt: index put: (self instVarAt: index).
            index := index - 1].
    ^ newObject

The code of the primitiveClone looks the same [1]


Changing `Context>copyTo:` by adding a `self pc` before calling `self copy` leads to no more crashes. Not sure if there is a reason for that or just plain luck.

A simple reduced stack is below (more details in this issue [1]). The crash happens always with contexts reified as objects (in this case 0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages). 
Could this suggest some kind of issue in the vm when reifying contexts, or just some other problem with memory corruption?

This looks like an oversight in some primitive.  Here for example is the implementation of the shallowCopy primitive, a.k.a. clone, and you can see where it explcitly intercepts access to a context.

primitiveClone
"Return a shallow copy of the receiver.
 Special-case non-single contexts (because of context-to-stack mapping).
 Can't fail for contexts cuz of image context instantiation code (sigh)."

| rcvr newCopy |
rcvr := self stackTop.
(objectMemory isImmediate: rcvr)
ifTrue:
[newCopy := rcvr]
ifFalse:
[(objectMemory isContextNonImm: rcvr)
ifTrue:
[newCopy := self cloneContext: rcvr]
ifFalse:
[(argumentCount = 0
  or: [(objectMemory isForwarded: rcvr) not])
ifTrue: [newCopy := objectMemory clone: rcvr]
ifFalse: [newCopy := 0]].
newCopy = 0 ifTrue:
[^self primitiveFailFor: PrimErrNoMemory]].
self pop: argumentCount + 1 thenPush: newCopy

But since Squeak doesn't have copyTo: I have no idea what primitive is being used.  I'm guessing 168 primitiveCopyObject, which seems to check for a Context receiver, but not for a CompiledCode receiver.  What does the primitive failure code look like?  Can you post the copyTo: implementations here please?

The code is above. I also see Context>>#copyTo: in Squeak calling also Object>>copy for contexts.

When a crash happens we don't get the exact same error all the time. For example we get most often on mac:

Process 35690 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001100b1004
->  0x1100b1004: inl    $0x4c, %eax
    0x1100b1006: leal   -0x5c(%rip), %eax
    0x1100b100c: pushq  %r8
    0x1100b100e: movabsq $0x1109e78e0, %r9         ; imm = 0x1109E78E0 
Target 0: (GlamorousToolkit) stopped.


Process 29929 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
    frame #0: 0x00000001100fe7ed
->  0x1100fe7ed: int3   
    0x1100fe7ee: int3   
    0x1100fe7ef: int3   
    0x1100fe7f0: int3   
Target 0: (GlamorousToolkit) stopped.



Cheers,
Andrei
 

 0x7ffeefbb4380 M Context(Object)>copy 0x1206b6990: a(n) Context
    0x7ffeefbb43b8 M Context>copyTo: 0x1206b6990: a(n) Context
    0x7ffeefbb4400 M Context>copyTo: 0x1206b5ae0: a(n) Context
  ...
    0x7ffeefba6078 M Context>copyTo: 0x110548b28: a(n) Context
    0x7ffeefba60d0 I Context>copyTo: 0x110548a70: a(n) Context
    0x7ffeefba6118 I MessageNotUnderstood(Exception)>freezeUpTo: 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6160 I MessageNotUnderstood(Exception)>freeze 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba6190 M [] in GtExampleEvaluator>result 0x110544fb8: a(n) GtExampleEvaluator
    0x7ffeefba61c8 M BlockClosure>cull: 0x110545188: a(n) BlockClosure
    0x7ffeefba6208 M Context>evaluateSignal: 0x110548c98: a(n) Context
    0x7ffeefba6240 M Context>handleSignal: 0x110548c98: a(n) Context
    0x7ffeefba6278 M Context>handleSignal: 0x110548be0: a(n) Context
    0x7ffeefba62b0 M MessageNotUnderstood(Exception)>signal 0x110548a20: a(n) MessageNotUnderstood
    0x7ffeefba62f0 M GtDummyExamplesWithInheritanceSubclassB(Object)>doesNotUnderstand: exampleH 0x1105487d8: a(n) GtDummyExamplesWithInheritanceSubclassB
    0x7ffeefba6328 M GtExampleEvaluator>primitiveProcessExample:withEvaluationContext: 0x110544fb8: a(n) GtExampleEvaluator
 ...
    0x7ffeefbe64d0 M [] in GtExamplesHDReport class(HDReport class)>runPackages: 0x1145e41c8: a(n) GtExamplesHDReport class
    0x7ffeefbe6520 M [] in Set>collect: 0x1206b5ab0: a(n) Set
    0x7ffeefbe6568 M Array(SequenceableCollection)>do: 0x1206b5c50: a(n) Array
       0x1206b5b98 s Set>collect:
       0x1206b5ae0 s GtExamplesHDReport class(HDReport class)>runPackages:
       0x1206b6990 s [] in GtExamplesCommandLineHandler>runPackages
       0x1206b6a48 s BlockClosure>ensure:
       0x1206b6b68 s UIManager class>nonInteractiveDuring:
       0x1206b6c48 s GtExamplesCommandLineHandler>runPackages
       0x1206b6d98 s GtExamplesCommandLineHandler>activate
       0x1206b75d0 s GtExamplesCommandLineHandler class(CommandLineHandler class)>activateWith:
       0x1207d2f00 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x1207e6620 s BlockClosure>on:do:
       0x1207f7ab8 s PharoCommandLineHandler(BasicCommandLineHandler)>activateSubCommand:
       0x120809d40 s PharoCommandLineHandler(BasicCommandLineHandler)>handleSubcommand
       0x12082ca60 s PharoCommandLineHandler(BasicCommandLineHandler)>handleArgument:
       0x120789938 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207a83e0 s BlockClosure>on:do:
       0x1207b57a0 s [] in PharoCommandLineHandler(BasicCommandLineHandler)>activate
       0x1207bf830 s [] in BlockClosure>newProcess
Cheers,
Andrei





-- 
_,,,^..^,,,_
best, Eliot


-- 
_,,,^..^,,,_
best, Eliot

_,,,^..^,,,_ (phone)