CogVM Execution Flow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

CogVM Execution Flow

Ben Coman
 
In trying to understand the flow of execution (and in particular the
jumps in the jitted VM, I made a first rough pass to map it in the
attached chart.

I am trying to colourize it to distinguish between paths that can
return to the interpreter, those that circulate in jitted code, and
the transitions.  I'm sure I've missed the mark a bit but its a start.
Of course corrections welcome, even scanned pen sketches.

cheer -ben

CogVM-Execution-Flow (2016.06.13c).png (186K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
 
On Mon, Jun 13, 2016 at 9:24 PM, Ben Coman <[hidden email]> wrote:
> In trying to understand the flow of execution (and in particular the
> jumps in the jitted VM, I made a first rough pass to map it in the
> attached chart.
>
> I am trying to colourize it to distinguish between paths that can
> return to the interpreter, those that circulate in jitted code, and
> the transitions.  I'm sure I've missed the mark a bit but its a start.
> Of course corrections welcome, even scanned pen sketches.

If anyone wants to open the original, I used yEd - downloaded and
first used it today.  I found it quite good for the task after trying
Freemind and Inkscape.
Quick Tips: Drag background in right-click.  Left-drag a from a node
starts a connector.  To move a node you need to first select it so the
mouse pointer changes to a hand.
http://www.yworks.com/products/yed

cheers -ben

CogVM-Execution-Flow (2016.06.13c).graphml (129K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
 
On Mon, Jun 13, 2016 at 9:25 PM, Ben Coman <[hidden email]> wrote:

> On Mon, Jun 13, 2016 at 9:24 PM, Ben Coman <[hidden email]> wrote:
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.
>
> If anyone wants to open the original, I used yEd - downloaded and
> first used it today.  I found it quite good for the task after trying
> Freemind and Inkscape.
> Quick Tips: Drag background in right-click.  Left-drag a from a node
> starts a connector.  To move a node you need to first select it so the
> mouse pointer changes to a hand.
> http://www.yworks.com/products/yed
A few updates...
I'm not sure how big a file the mail list handles. The PNG is 350k. So
I just post the graphml for now.
cheers -ben

CogVM-Execution-Flow (2016.06.13d).zip (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Eliot Miranda-2
In reply to this post by Ben Coman

Ben,

    this looks fabulous.  I've just skimmed on my phone and will look in more depth later but this is an extremely valuable and illuminating approach, visualizing something I have in my head but that is difficult to describe.  Thanks for doing this!! One thing I would love for you to do (and I can help) is to create two diagrams, one for the simulator and one for the real VM, and relate the two, for example by colour-coding the common activities, and hence identifying simulator-only "distractions".  

_,,,^..^,,,_ (phone)

> On Jun 13, 2016, at 6:24 AM, Ben Coman <[hidden email]> wrote:
>
> In trying to understand the flow of execution (and in particular the
> jumps in the jitted VM, I made a first rough pass to map it in the
> attached chart.
>
> I am trying to colourize it to distinguish between paths that can
> return to the interpreter, those that circulate in jitted code, and
> the transitions.  I'm sure I've missed the mark a bit but its a start.
> Of course corrections welcome, even scanned pen sketches.
>
> cheer -ben
> <CogVM-Execution-Flow (2016.06.13c).png>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

timrowledge
In reply to this post by Ben Coman


> On 13-06-2016, at 6:24 AM, Ben Coman <[hidden email]> wrote:
>
> In trying to understand the flow of execution (and in particular the
> jumps in the jitted VM, I made a first rough pass to map it in the
> attached chart.

To paraphrase that great philosopher Billious O’Liarly, “code goes in, code goes out - you can’t explain that!"

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- "Body by Fisher -- brains by Mattel."


Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Eliot Miranda-2
 
On Tue, Jun 14, 2016 at 12:36 AM, Eliot Miranda <[hidden email]> wrote:

>
> Ben,
>
>     this looks fabulous.  I've just skimmed on my phone and will look in more depth later but this is an extremely valuable and illuminating approach, visualizing something I have in my head but that is difficult to describe.  Thanks for doing this!! One thing I would love for you to do (and I can help) is to create two diagrams, one for the simulator and one for the real VM, and relate the two, for example by colour-coding the common activities, and hence identifying simulator-only "distractions".
>
> _,,,^..^,,,_ (phone)
>
>> On Jun 13, 2016, at 6:24 AM, Ben Coman <[hidden email]> wrote:
>>
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.
>>
>> cheer -ben
>> <CogVM-Execution-Flow (2016.06.13c).png>
I consolidated a few boxes and aligned common tasks, particularly the
returnToExecutive one.
cheers -ben

CogVM-Execution-Flow-2016.06.13f.zip (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Eliot Miranda-2
In reply to this post by Ben Coman
 
Hi Ben,

    the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...

The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.

These are the execution state structures:

1. the C stack.
2. the Smalltalk stack zone.
3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).

These are the bodies of code:
4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution

So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)

Your diagram names some of the surface transitions, but not the deeper when and why.  Here they are:

a) execution begins on the C stack as the program is launched.  Once the heap is loaded, swizzling pointers as required, the interpreter is entered.  On first entry it
  a1) allocates space for the stack zone on the C stack
  a2) "marries" the context in the image that invoked the snapshot primitive (a stack frame in the stack zone is built for the context, and the context is changed to become a proxy for that stack frame).
  a3) captures the top of the C stack (CStackPointer & CFramePointer) as interpret is about to be invoked, including creating a "landing pad" for jumping back into the interpreter
    a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to via longjmp
    a3 sim) the landing pad is an exception handler for the ReenterInterpreter notification
  a4) calls interpret to start interpreting the method that executed the snapshot primitive


Invoking the run-time:
Machine code calls into the run-time for several facilities: adding an object to the remembered table if a store check indicates this must happen, running a primitive in the run-time, entering the run-time to lookup and bind a machine code send, or a linked send that has missed.  To invoke the run-time, the machine code saves the native stack and frame pointers (those of the current Smalltalk stack frame) in stackPointer and framePointer, sets the native stack and frame pointers to CStackPointer and CFramePointer, passes parameters (pushing x86, loading registers ARM, x64) and calls the run-time routine.  Simple routines (adding element to the remembered set) simply perform the operation and return. The code returned to then switches back to the Smalltalk stack pointers and continues.  Routines that change the Smalltalk frame (send-linking routines, complex primitives such as perform:) reenter via an enilopmart.


Transition to the interpreter:
So any time the machine code wants to transition to the interpreter (not simply call a routine in the run-time, but to interpret an as-yet-unjitted/unjittable method, either via send or return, the machine code switches the frame and stack pointers to those captured in a3) and longjmps (raises the ReenterInterpreter exception).  It does this by calling a run-time routine (as in "Invoking the run-time") that actually performs the longjmp.  Any intervening state on the C stack will be discarded, and execution will be in the same state as when the interpret routine was entered immediately after initialising the stack zone.


N.B. Note that if the interpreter merely called the machine-code, and the machine-code merely called the run-time, instead of substituting the stack and frame pointers with CStackPointer and CFramePointer set up on initial invocation of interpret, then the C stack would grow on each transition between machine code execution and interpreter/run-time execution and the C stack would soon overflow.


Call-backs:

The C stack /can/ grow however.  If a call-out calls back then the call-back executes lower down the C stack.  A call out will have been made from some primitive invoked either from the interpreter or machine-code, and that primitive will run on the C stack.  On calling back, the VM saves the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state associated with the call-back, and then reenters the interpreter, saving new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.  Execution now continues in this new part of the C stack below the original.  On the call-back returning (again via a primitive), the CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before returning to the C code that invoked the call-back.  Once this C code returns, the stack is unwound back to the state before the call-out was invoked.


Transition to machine-code:
The interpreter uses the simple policy of jitting a method if it is found in the first-level method lookup cache, effectively hitting methods that are used more than once.  If the jitter method contains a primitive, that primitive routine will be invoked just as if it were an interpreted method.  If the method doesn't have a primitive, the interpreter will jump into machine code immediately.  t jumps into machine code by pushing any parameters (the state of the machine code registers, such as ReceiverResultReg, and the machine code address to begin execution) onto the top of the Smalltalk stack, and calling an enilopmart that switches from the C to the Smalltalk stack, loads the registers and jumps to the machine code address via a return instruction that pops the entry point off the Smalltalk stack.


Simulating these transitions in the Simulator:
In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2., 3., 5., & 6. live in the memory inst var of the object memory, a large ByteArray.  The machine code lives in the bottom of this memory byte array (MBA), and has no direct access to the Smalltalk objects.  In the real VM, the correlates of these objects all exist at specific addresses and may be accessed directly from machine code.  In the simulator this is not possible.  Instead, these objects are all assigned out-of-bounds addresses, and a dictionary maps from the specific out-of-bounds address to the specific object being accessed, e.g. stackPointer, an inst var of InterpreterPrimitives, the superclass of StackInterpreter, has an address in simulatedAddresses that maps to a block that does a perform to access stackPointer's value.  See CoInterpreter>>stackPointerAddress.  

Machine code is executed by one of the processor aliens via the primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These primitives will fail when they encounter an illegal instruction, including an instruction that tried to fetch or store or jump to an out-of-bounds address.  The primitive failure code analyses the instruction that failed and (when appropriate, the instruction may actually be illegal, the result of some bug in the system, but is typically an intended access of some run-time object) creates an instance of the ProcessorSimulationTrap exception and raises it.  The handler then handles the exception to either fetch, store or invoke Smalltalk objects in the simulation, and once handled execution can continue.

Hence in the simulator primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their wrappers singleStepIn:minimumAddress:readOnlyBelow: & runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for machine code break-points using the breakBlock.

In the same way that the VM must avoid C stack growth when transitioning between machine code and the interpreter/run-time above, so the simulator must avoid uncontrolled stack growth when the simulated machine code invokes Smalltalk code which again invokes simulated machine code.  So the code that invokes the run-time from simulateCogCodeAt: (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to reenter machine code via an enilopmart it sends Cogit>>#simulateEnilopmart:numArgs: which raises the notification before sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end up returning to that first Cogit>>simulateCogCodeAt: to continue execution.


Ben, given the above, can you now see how your yellow boxes name specific transitions amongst the structures explained below?  I hope I've encouraged you, not discouraged you, to revise and bifurcate your diagram into two state transition diagrams for the real and simulated VM.  It would be great to have really good diagrammatic representations of the above.

And once we have that, we can build the relatively simple extension that allows the interpreter and machine code to interleave interpreted and machine code frames on the Smalltalk stack (2.) that allow the VM to freely switch between interpreted and jitter code, and to fall back on the interpreter whenever convenient.

On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <[hidden email]> wrote:
 
In trying to understand the flow of execution (and in particular the
jumps in the jitted VM, I made a first rough pass to map it in the
attached chart.

I am trying to colourize it to distinguish between paths that can
return to the interpreter, those that circulate in jitted code, and
the transitions.  I'm sure I've missed the mark a bit but its a start.
Of course corrections welcome, even scanned pen sketches.

cheer -ben




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Ben Coman

On Tue, Jun 14, 2016 at 2:21 AM, Ben Coman <[hidden email]> wrote:

> On Tue, Jun 14, 2016 at 12:36 AM, Eliot Miranda <[hidden email]> wrote:
>>
>> Ben,
>>
>>     this looks fabulous.  I've just skimmed on my phone and will look in more depth later but this is an extremely valuable and illuminating approach, visualizing something I have in my head but that is difficult to describe.  Thanks for doing this!! One thing I would love for you to do (and I can help) is to create two diagrams, one for the simulator and one for the real VM, and relate the two, for example by colour-coding the common activities, and hence identifying simulator-only "distractions".
>>
>> _,,,^..^,,,_ (phone)
>>
>>> On Jun 13, 2016, at 6:24 AM, Ben Coman <[hidden email]> wrote:
>>>
>>> In trying to understand the flow of execution (and in particular the
>>> jumps in the jitted VM, I made a first rough pass to map it in the
>>> attached chart.
>>>
>>> I am trying to colourize it to distinguish between paths that can
>>> return to the interpreter, those that circulate in jitted code, and
>>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>>> Of course corrections welcome, even scanned pen sketches.
>>>
>>> cheer -ben
>>> <CogVM-Execution-Flow (2016.06.13c).png>
>
> I consolidated a few boxes and aligned common tasks, particularly the
> returnToExecutive one.
> cheers -ben

Two initial comments:
* It would be nice if the two  "(stackPointer >= stackLimit)" and
"(localSP < stackLimit)" were not inverted
* Is the "siglong: /reenterInterpreter/ jmp: ReturnToInterpreter" at
the end of  #interpretMethodFromMachineCode required?  It executes
after activateNewMethod and returnToExecutive:postContextSwitch which
has a siglong:jump anyway.  And it messes with the diagram :)

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Eliot Miranda-2

It will take me a while to digest this, but I'll happy to give it a go.
cheers -ben

On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <[hidden email]> wrote:

>
> Hi Ben,
>
>     the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...
>
> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>
> These are the execution state structures:
>
> 1. the C stack.
> 2. the Smalltalk stack zone.
> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>
> These are the bodies of code:
> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>
> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
>
> Your diagram names some of the surface transitions, but not the deeper when and why.  Here they are:
>
> a) execution begins on the C stack as the program is launched.  Once the heap is loaded, swizzling pointers as required, the interpreter is entered.  On first entry it
>   a1) allocates space for the stack zone on the C stack
>   a2) "marries" the context in the image that invoked the snapshot primitive (a stack frame in the stack zone is built for the context, and the context is changed to become a proxy for that stack frame).
>   a3) captures the top of the C stack (CStackPointer & CFramePointer) as interpret is about to be invoked, including creating a "landing pad" for jumping back into the interpreter
>     a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to via longjmp
>     a3 sim) the landing pad is an exception handler for the ReenterInterpreter notification
>   a4) calls interpret to start interpreting the method that executed the snapshot primitive
>
>
> Invoking the run-time:
> Machine code calls into the run-time for several facilities: adding an object to the remembered table if a store check indicates this must happen, running a primitive in the run-time, entering the run-time to lookup and bind a machine code send, or a linked send that has missed.  To invoke the run-time, the machine code saves the native stack and frame pointers (those of the current Smalltalk stack frame) in stackPointer and framePointer, sets the native stack and frame pointers to CStackPointer and CFramePointer, passes parameters (pushing x86, loading registers ARM, x64) and calls the run-time routine.  Simple routines (adding element to the remembered set) simply perform the operation and return. The code returned to then switches back to the Smalltalk stack pointers and continues.  Routines that change the Smalltalk frame (send-linking routines, complex primitives such as perform:) reenter via an enilopmart.
>
>
> Transition to the interpreter:
> So any time the machine code wants to transition to the interpreter (not simply call a routine in the run-time, but to interpret an as-yet-unjitted/unjittable method, either via send or return, the machine code switches the frame and stack pointers to those captured in a3) and longjmps (raises the ReenterInterpreter exception).  It does this by calling a run-time routine (as in "Invoking the run-time") that actually performs the longjmp.  Any intervening state on the C stack will be discarded, and execution will be in the same state as when the interpret routine was entered immediately after initialising the stack zone.
>
>
> N.B. Note that if the interpreter merely called the machine-code, and the machine-code merely called the run-time, instead of substituting the stack and frame pointers with CStackPointer and CFramePointer set up on initial invocation of interpret, then the C stack would grow on each transition between machine code execution and interpreter/run-time execution and the C stack would soon overflow.
>
>
> Call-backs:
>
> The C stack /can/ grow however.  If a call-out calls back then the call-back executes lower down the C stack.  A call out will have been made from some primitive invoked either from the interpreter or machine-code, and that primitive will run on the C stack.  On calling back, the VM saves the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state associated with the call-back, and then reenters the interpreter, saving new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.  Execution now continues in this new part of the C stack below the original.  On the call-back returning (again via a primitive), the CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before returning to the C code that invoked the call-back.  Once this C code returns, the stack is unwound back to the state before the call-out was invoked.
>
>
> Transition to machine-code:
> The interpreter uses the simple policy of jitting a method if it is found in the first-level method lookup cache, effectively hitting methods that are used more than once.  If the jitter method contains a primitive, that primitive routine will be invoked just as if it were an interpreted method.  If the method doesn't have a primitive, the interpreter will jump into machine code immediately.  t jumps into machine code by pushing any parameters (the state of the machine code registers, such as ReceiverResultReg, and the machine code address to begin execution) onto the top of the Smalltalk stack, and calling an enilopmart that switches from the C to the Smalltalk stack, loads the registers and jumps to the machine code address via a return instruction that pops the entry point off the Smalltalk stack.
>
>
> Simulating these transitions in the Simulator:
> In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2., 3., 5., & 6. live in the memory inst var of the object memory, a large ByteArray.  The machine code lives in the bottom of this memory byte array (MBA), and has no direct access to the Smalltalk objects.  In the real VM, the correlates of these objects all exist at specific addresses and may be accessed directly from machine code.  In the simulator this is not possible.  Instead, these objects are all assigned out-of-bounds addresses, and a dictionary maps from the specific out-of-bounds address to the specific object being accessed, e.g. stackPointer, an inst var of InterpreterPrimitives, the superclass of StackInterpreter, has an address in simulatedAddresses that maps to a block that does a perform to access stackPointer's value.  See CoInterpreter>>stackPointerAddress.
>
> Machine code is executed by one of the processor aliens via the primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These primitives will fail when they encounter an illegal instruction, including an instruction that tried to fetch or store or jump to an out-of-bounds address.  The primitive failure code analyses the instruction that failed and (when appropriate, the instruction may actually be illegal, the result of some bug in the system, but is typically an intended access of some run-time object) creates an instance of the ProcessorSimulationTrap exception and raises it.  The handler then handles the exception to either fetch, store or invoke Smalltalk objects in the simulation, and once handled execution can continue.
>
> Hence in the simulator primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their wrappers singleStepIn:minimumAddress:readOnlyBelow: & runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for machine code break-points using the breakBlock.
>
> In the same way that the VM must avoid C stack growth when transitioning between machine code and the interpreter/run-time above, so the simulator must avoid uncontrolled stack growth when the simulated machine code invokes Smalltalk code which again invokes simulated machine code.  So the code that invokes the run-time from simulateCogCodeAt: (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to reenter machine code via an enilopmart it sends Cogit>>#simulateEnilopmart:numArgs: which raises the notification before sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end up returning to that first Cogit>>simulateCogCodeAt: to continue execution.
>
>
> Ben, given the above, can you now see how your yellow boxes name specific transitions amongst the structures explained below?  I hope I've encouraged you, not discouraged you, to revise and bifurcate your diagram into two state transition diagrams for the real and simulated VM.  It would be great to have really good diagrammatic representations of the above.
>
> And once we have that, we can build the relatively simple extension that allows the interpreter and machine code to interleave interpreted and machine code frames on the Smalltalk stack (2.) that allow the VM to freely switch between interpreted and jitter code, and to fall back on the interpreter whenever convenient.
>
> On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <[hidden email]> wrote:
>>
>>
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.
>>
>> cheer -ben
>>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Eliot Miranda-2
 
On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <[hidden email]> wrote:

>
> Hi Ben,
>
>     the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...
>
> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>
> These are the execution state structures:
>
> 1. the C stack.
> 2. the Smalltalk stack zone.
> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>
> These are the bodies of code:
> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>
> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
I don't have the charting tool where I am atm, so I knocked up the
above in Excel with a few embellishments that need checking. I am a
bit confused by "primitives in 4. runs on 2. "  when "4. (the
run-time) executes solely on 1."  and primitives are part of 4.

Also I'm not clear on what a "linked-send" is?

cheers -ben

Cog-structure%code.png (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Eliot Miranda-2

Hi Ben,

> On Jun 13, 2016, at 6:49 PM, Ben Coman <[hidden email]> wrote:
>
>> On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <[hidden email]> wrote:
>>
>> Hi Ben,
>>
>>    the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...
>>
>> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>>
>> These are the execution state structures:
>>
>> 1. the C stack.
>> 2. the Smalltalk stack zone.
>> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>>
>> These are the bodies of code:
>> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
>> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
>> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>>
>> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
>> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
>
> I don't have the charting tool where I am atm, so I knocked up the
> above in Excel with a few embellishments that need checking. I am a
> bit confused by "primitives in 4. runs on 2. "  when "4. (the
> run-time) executes solely on 1."  and primitives are part of 4.

All primitives are implemented either in the interpreter, or in plugins, and some of the core primitives (arithmetic, comparison, object access and instantiation, block evaluation and perform) are also implemented by the JIT in machine code versions.  Taking the former first, all primitives in the interpreter and plugins are C functions that get called either from the interpreter (slowPrimitiveResponse) or from a cogged (machine code) method containing a primitive.  When running these primitives are running on the C stack, even though they take their parameters from the Smalltalk stack.  So they are 4. (part of the run-time) running on 1. (the C stack).

The machine code versions of primitives are compiled into the start of cogged methods that include one of the vote primitives the JIT is able to generate machine code for.  This code gets executed directly when a cogged method is invoked.  [Tangent: since 0,1 & 2 argument sends use a register based calling convention, most machine code primitives (the only exception being perform:with:with:) take their arguments from registers and answer their result in a register.  So they're much much faster: direct access to arguments instead of indirecting through stackPointer, no stack-switching call/return from Smalltalk stack to C stack and back, but they have to be written in the JIT's assembler language.]

> Also I'm not clear on what a "linked-send" is?

This is another tangent, but key to how the JIT speeds up normal Smalltalk sends.  A linked send is how the JIT speeds up sends; it is the implementation of an inline per-send-site send cache.  In machine code, sends get compiled as a register load (of a selector) followed by a call (of a trampoline that calls ceSend:super:numArgs:).  When first executed, the send of the selector to the current receiver gets looked up and the instruction sequence gets rewritten into a register load (of the class index if the current receiver) followed by a call (to the method that ceSend:super:numArgs: looked up and jitted).  This latter form is a linked send because it is linked to the entry point of some target method.  That target method's entry code checks that the class index of the current receiver matches that in the register load, and continues executing the method if they match or calling code to rebound the send to a PIC if they don't.  So once the send is linked subsequent executions call the target method directly and perform a relatively cheap class check that succeeds in most cases.  If ever it misses, the send site will get bound to a "closed" PIC, a little jump table created specific to that send site, that can hold up to 6 class index comparison, jump pairs (it can dispatch up to 6 classes of receiver) and if there are more than 6, will get rewritten to an "open" PIC specific to the selector, that probes the first level method lookup cache.  So in practice all sends except about 1% of megamorphic sends such as the sends of basicNew and initialize in Behavior>>#new settle down into calls, with no relinking occurring until either the program changes flow (introducing polymorphism) or the code zone fills up, methods are discarded and sends unlinked and later reexecuted, which doesn't happen very often.

> cheers -ben
> <Cog-structure%code.png>

_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

timrowledge


> On 13-06-2016, at 7:21 PM, Eliot Miranda <[hidden email]> wrote:
>  If ever it misses, the send site will get bound to a "closed" PIC, a little jump table created specific to that send site, that can hold up to 6 class index comparison, jump pairs (it can dispatch up to 6 classes of receiver) and if there are more than 6, will get rewritten to an "open" PIC specific to the selector, that probes the first level method lookup cache.

Whilst reading that a strange thought leaped up from somewhere and landed in my head; might it be useful to just replace the final pic-miss jump with a jump instead to the open lookup but keep the other six previously carefully built cases? It’s not like they’ve gone away in any sense. Might need to change the use of the ClassReg there too? After all, the open lookup is really just a pic-miss that doesn’t build a new cpic entry...


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Thinks E=MC^2 is a rap star.


Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Levente Uzonyi
 
That would make sense if the six entries were the among the most commonly
used ones.
We had a discussion recently about adaptive PICs, which would reorder the
entries based on some statistics, and the conclusion was that the overhead
would be way higher than the potential benefit.
I did some measurements at that time about what the ideal size of a PIC
would be. On my machine 12 had the same worst(?) case lookup time as the
open PIC. Don't ask me how I measured it. :)

Levente

On Mon, 13 Jun 2016, tim Rowledge wrote:

>
>
>> On 13-06-2016, at 7:21 PM, Eliot Miranda <[hidden email]> wrote:
>>  If ever it misses, the send site will get bound to a "closed" PIC, a little jump table created specific to that send site, that can hold up to 6 class index comparison, jump pairs (it can dispatch up to 6 classes of receiver) and if there are more than 6, will get rewritten to an "open" PIC specific to the selector, that probes the first level method lookup cache.
>
> Whilst reading that a strange thought leaped up from somewhere and landed in my head; might it be useful to just replace the final pic-miss jump with a jump instead to the open lookup but keep the other six previously carefully built cases? It’s not like they’ve gone away in any sense. Might need to change the use of the ClassReg there too? After all, the open lookup is really just a pic-miss that doesn’t build a new cpic entry...
>
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Useful random insult:- Thinks E=MC^2 is a rap star.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Eliot Miranda-2
 
On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <[hidden email]> wrote:

>
> Hi Ben,
>
>     the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.  So...
>
> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>
> These are the execution state structures:
>
> 1. the C stack.
> 2. the Smalltalk stack zone.
> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>
> These are the bodies of code:
> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>
> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
>
> Your diagram names some of the surface transitions, but not the deeper when and why.  Here they are:
>
> a) execution begins on the C stack as the program is launched.  Once the heap is loaded, swizzling pointers as required, the interpreter is entered.  On first entry it
>   a1) allocates space for the stack zone on the C stack
>   a2) "marries" the context in the image that invoked the snapshot primitive (a stack frame in the stack zone is built for the context, and the context is changed to become a proxy for that stack frame).
>   a3) captures the top of the C stack (CStackPointer & CFramePointer) as interpret is about to be invoked, including creating a "landing pad" for jumping back into the interpreter
>     a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to via longjmp
>     a3 sim) the landing pad is an exception handler for the ReenterInterpreter notification
>   a4) calls interpret to start interpreting the method that executed the snapshot primitive
>
>
> Invoking the run-time:
> Machine code calls into the run-time for several facilities: adding an object to the remembered table if a store check indicates this must happen, running a primitive in the run-time, entering the run-time to lookup and bind a machine code send, or a linked send that has missed.  To invoke the run-time, the machine code saves the native stack and frame pointers (those of the current Smalltalk stack frame) in stackPointer and framePointer, sets the native stack and frame pointers to CStackPointer and CFramePointer, passes parameters (pushing x86, loading registers ARM, x64) and calls the run-time routine.  Simple routines (adding element to the remembered set) simply perform the operation and return. The code returned to then switches back to the Smalltalk stack pointers and continues.  Routines that change the Smalltalk frame (send-linking routines, complex primitives such as perform:) reenter via an enilopmart.
>
>
> Transition to the interpreter:
> So any time the machine code wants to transition to the interpreter (not simply call a routine in the run-time, but to interpret an as-yet-unjitted/unjittable method, either via send or return, the machine code switches the frame and stack pointers to those captured in a3) and longjmps (raises the ReenterInterpreter exception).  It does this by calling a run-time routine (as in "Invoking the run-time") that actually performs the longjmp.  Any intervening state on the C stack will be discarded, and execution will be in the same state as when the interpret routine was entered immediately after initialising the stack zone.
>
>
> N.B. Note that if the interpreter merely called the machine-code, and the machine-code merely called the run-time, instead of substituting the stack and frame pointers with CStackPointer and CFramePointer set up on initial invocation of interpret, then the C stack would grow on each transition between machine code execution and interpreter/run-time execution and the C stack would soon overflow.
>
>
> Call-backs:
>
> The C stack /can/ grow however.  If a call-out calls back then the call-back executes lower down the C stack.  A call out will have been made from some primitive invoked either from the interpreter or machine-code, and that primitive will run on the C stack.  On calling back, the VM saves the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state associated with the call-back, and then reenters the interpreter, saving new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.  Execution now continues in this new part of the C stack below the original.  On the call-back returning (again via a primitive), the CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before returning to the C code that invoked the call-back.  Once this C code returns, the stack is unwound back to the state before the call-out was invoked.
>
>
> Transition to machine-code:
> The interpreter uses the simple policy of jitting a method if it is found in the first-level method lookup cache, effectively hitting methods that are used more than once.  If the jitter method contains a primitive, that primitive routine will be invoked just as if it were an interpreted method.  If the method doesn't have a primitive, the interpreter will jump into machine code immediately.  t jumps into machine code by pushing any parameters (the state of the machine code registers, such as ReceiverResultReg, and the machine code address to begin execution) onto the top of the Smalltalk stack, and calling an enilopmart that switches from the C to the Smalltalk stack, loads the registers and jumps to the machine code address via a return instruction that pops the entry point off the Smalltalk stack.
>
>
> Simulating these transitions in the Simulator:
> In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2., 3., 5., & 6. live in the memory inst var of the object memory, a large ByteArray.  The machine code lives in the bottom of this memory byte array (MBA), and has no direct access to the Smalltalk objects.  In the real VM, the correlates of these objects all exist at specific addresses and may be accessed directly from machine code.  In the simulator this is not possible.  Instead, these objects are all assigned out-of-bounds addresses, and a dictionary maps from the specific out-of-bounds address to the specific object being accessed, e.g. stackPointer, an inst var of InterpreterPrimitives, the superclass of StackInterpreter, has an address in simulatedAddresses that maps to a block that does a perform to access stackPointer's value.  See CoInterpreter>>stackPointerAddress.
>
> Machine code is executed by one of the processor aliens via the primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These primitives will fail when they encounter an illegal instruction, including an instruction that tried to fetch or store or jump to an out-of-bounds address.  The primitive failure code analyses the instruction that failed and (when appropriate, the instruction may actually be illegal, the result of some bug in the system, but is typically an intended access of some run-time object) creates an instance of the ProcessorSimulationTrap exception and raises it.  The handler then handles the exception to either fetch, store or invoke Smalltalk objects in the simulation, and once handled execution can continue.
>
> Hence in the simulator primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their wrappers singleStepIn:minimumAddress:readOnlyBelow: & runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for machine code break-points using the breakBlock.
>
> In the same way that the VM must avoid C stack growth when transitioning between machine code and the interpreter/run-time above, so the simulator must avoid uncontrolled stack growth when the simulated machine code invokes Smalltalk code which again invokes simulated machine code.  So the code that invokes the run-time from simulateCogCodeAt: (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to reenter machine code via an enilopmart it sends Cogit>>#simulateEnilopmart:numArgs: which raises the notification before sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end up returning to that first Cogit>>simulateCogCodeAt: to continue execution.
>
>
> Ben, given the above, can you now see how your yellow boxes name specific transitions amongst the structures explained below?  I hope I've encouraged you, not discouraged you, to revise and bifurcate your diagram into two state transition diagrams for the real and simulated VM.
Thats cool.  Just another good reason to spend time soaking up details
as goal directed learning.   Now what I found easiest was to start
from scratch to convert your prose verbatim into a diagram, rather
than referring to the code as I was before.   Now I feel the result
somewhat butchered what you said, but at least it provides another
perspective as a point of discussion to build my understanding.  Then
try to merging it into the one I was deriving from the code.

I've not considered the simulation part yet.  Thought it best to get
some feedback.  Its been a looong time since I've done any formal
diagramming and the symbology is poor.  Can anyone suggest a useful
type of diagram for this circumstance I could look up and give a try.

Attached: CogVM-Transitions-2016.06.14a.zip

cheers -ben



> It would be great to have really good diagrammatic representations of the above.
>
> And once we have that, we can build the relatively simple extension that allows the interpreter and machine code to interleave interpreted and machine code frames on the Smalltalk stack (2.) that allow the VM to freely switch between interpreted and jitter code, and to fall back on the interpreter whenever convenient.
>
> On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <[hidden email]> wrote:
>>
>>
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.

CogVM-Transitions-2016.06.14a.zip (221K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
In reply to this post by Eliot Miranda-2

On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <[hidden email]> wrote:
>
> Hi Ben,
>
>     the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.

By transitions do you mean graph edges?  That actually turns out a
little difficult because I can't attach multiple edges together like I
can attached multiple edges to a shape - but I'll keep trying.

btw, Are externalizeIPandSP and internalizeIPandSP big hints as to the
transitions between those main structures?  If so, could you spell out
which is which.  Also do stackPointer, framePointer,
instructionPointer, localSP, localFP, etc belong to certain of those
structures?

cheers -ben

> So...
> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>
> These are the execution state structures:
>
> 1. the C stack.
> 2. the Smalltalk stack zone.
> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>
> These are the bodies of code:
> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>
> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
>
> Your diagram names some of the surface transitions, but not the deeper when and why.  Here they are:
>
> a) execution begins on the C stack as the program is launched.  Once the heap is loaded, swizzling pointers as required, the interpreter is entered.  On first entry it
>   a1) allocates space for the stack zone on the C stack
>   a2) "marries" the context in the image that invoked the snapshot primitive (a stack frame in the stack zone is built for the context, and the context is changed to become a proxy for that stack frame).
>   a3) captures the top of the C stack (CStackPointer & CFramePointer) as interpret is about to be invoked, including creating a "landing pad" for jumping back into the interpreter
>     a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to via longjmp
>     a3 sim) the landing pad is an exception handler for the ReenterInterpreter notification
>   a4) calls interpret to start interpreting the method that executed the snapshot primitive
>
>
> Invoking the run-time:
> Machine code calls into the run-time for several facilities: adding an object to the remembered table if a store check indicates this must happen, running a primitive in the run-time, entering the run-time to lookup and bind a machine code send, or a linked send that has missed.  To invoke the run-time, the machine code saves the native stack and frame pointers (those of the current Smalltalk stack frame) in stackPointer and framePointer, sets the native stack and frame pointers to CStackPointer and CFramePointer, passes parameters (pushing x86, loading registers ARM, x64) and calls the run-time routine.  Simple routines (adding element to the remembered set) simply perform the operation and return. The code returned to then switches back to the Smalltalk stack pointers and continues.  Routines that change the Smalltalk frame (send-linking routines, complex primitives such as perform:) reenter via an enilopmart.
>
>
> Transition to the interpreter:
> So any time the machine code wants to transition to the interpreter (not simply call a routine in the run-time, but to interpret an as-yet-unjitted/unjittable method, either via send or return, the machine code switches the frame and stack pointers to those captured in a3) and longjmps (raises the ReenterInterpreter exception).  It does this by calling a run-time routine (as in "Invoking the run-time") that actually performs the longjmp.  Any intervening state on the C stack will be discarded, and execution will be in the same state as when the interpret routine was entered immediately after initialising the stack zone.
>
>
> N.B. Note that if the interpreter merely called the machine-code, and the machine-code merely called the run-time, instead of substituting the stack and frame pointers with CStackPointer and CFramePointer set up on initial invocation of interpret, then the C stack would grow on each transition between machine code execution and interpreter/run-time execution and the C stack would soon overflow.
>
>
> Call-backs:
>
> The C stack /can/ grow however.  If a call-out calls back then the call-back executes lower down the C stack.  A call out will have been made from some primitive invoked either from the interpreter or machine-code, and that primitive will run on the C stack.  On calling back, the VM saves the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state associated with the call-back, and then reenters the interpreter, saving new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.  Execution now continues in this new part of the C stack below the original.  On the call-back returning (again via a primitive), the CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before returning to the C code that invoked the call-back.  Once this C code returns, the stack is unwound back to the state before the call-out was invoked.
>
>
> Transition to machine-code:
> The interpreter uses the simple policy of jitting a method if it is found in the first-level method lookup cache, effectively hitting methods that are used more than once.  If the jitter method contains a primitive, that primitive routine will be invoked just as if it were an interpreted method.  If the method doesn't have a primitive, the interpreter will jump into machine code immediately.  t jumps into machine code by pushing any parameters (the state of the machine code registers, such as ReceiverResultReg, and the machine code address to begin execution) onto the top of the Smalltalk stack, and calling an enilopmart that switches from the C to the Smalltalk stack, loads the registers and jumps to the machine code address via a return instruction that pops the entry point off the Smalltalk stack.
>
>
> Simulating these transitions in the Simulator:
> In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2., 3., 5., & 6. live in the memory inst var of the object memory, a large ByteArray.  The machine code lives in the bottom of this memory byte array (MBA), and has no direct access to the Smalltalk objects.  In the real VM, the correlates of these objects all exist at specific addresses and may be accessed directly from machine code.  In the simulator this is not possible.  Instead, these objects are all assigned out-of-bounds addresses, and a dictionary maps from the specific out-of-bounds address to the specific object being accessed, e.g. stackPointer, an inst var of InterpreterPrimitives, the superclass of StackInterpreter, has an address in simulatedAddresses that maps to a block that does a perform to access stackPointer's value.  See CoInterpreter>>stackPointerAddress.
>
> Machine code is executed by one of the processor aliens via the primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These primitives will fail when they encounter an illegal instruction, including an instruction that tried to fetch or store or jump to an out-of-bounds address.  The primitive failure code analyses the instruction that failed and (when appropriate, the instruction may actually be illegal, the result of some bug in the system, but is typically an intended access of some run-time object) creates an instance of the ProcessorSimulationTrap exception and raises it.  The handler then handles the exception to either fetch, store or invoke Smalltalk objects in the simulation, and once handled execution can continue.
>
> Hence in the simulator primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their wrappers singleStepIn:minimumAddress:readOnlyBelow: & runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for machine code break-points using the breakBlock.
>
> In the same way that the VM must avoid C stack growth when transitioning between machine code and the interpreter/run-time above, so the simulator must avoid uncontrolled stack growth when the simulated machine code invokes Smalltalk code which again invokes simulated machine code.  So the code that invokes the run-time from simulateCogCodeAt: (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to reenter machine code via an enilopmart it sends Cogit>>#simulateEnilopmart:numArgs: which raises the notification before sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end up returning to that first Cogit>>simulateCogCodeAt: to continue execution.
>
>
> Ben, given the above, can you now see how your yellow boxes name specific transitions amongst the structures explained below?  I hope I've encouraged you, not discouraged you, to revise and bifurcate your diagram into two state transition diagrams for the real and simulated VM.  It would be great to have really good diagrammatic representations of the above.
>
> And once we have that, we can build the relatively simple extension that allows the interpreter and machine code to interleave interpreted and machine code frames on the Smalltalk stack (2.) that allow the VM to freely switch between interpreted and jitter code, and to fall back on the interpreter whenever convenient.
>
> On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <[hidden email]> wrote:
>>
>>
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.
>>
>> cheer -ben
>>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
 
btw, what does the "ce" stand for in all the  methods...
CI>> ceDynamicSuperSend:receiver:
CI>> ceImplictRecevierSend:receiver:
CI>> ceOuterSend:receiver:
CI>> ceSelfSend:

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Ben Coman
 
Some other possible hints of transitions between structures is that
only some methods have pragmas?? (Like <var: #cogMethod type:.....)

Only some methods are marked for <inline>.

I notice a function... convertToMachineCodeFrame: cogMethod bcpc:

cheers- ben
Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Eliot Miranda-2
In reply to this post by Ben Coman
 
Hi Ben,

On Tuesday, June 14, 2016, Ben Coman <[hidden email]> wrote:

On Tue, Jun 14, 2016 at 2:41 AM, Eliot Miranda <<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;eliot.miranda@gmail.com&#39;)">eliot.miranda@...> wrote:
>
> Hi Ben,
>
>     the diagram below shows the trees, but the wood is arguably more important.  The diagram below is focussing on the transitions, but doesn't clearly show what is being transitioned between.  I imagine a diagram which shows the structures and has what you have in the yellow boxes as transitions.

By transitions do you mean graph edges?

Yes.  For example ceSend: is a transition from machine code to the run-time.


 That actually turns out a
little difficult because I can't attach multiple edges together like I
can attached multiple edges to a shape - but I'll keep trying.

btw, Are externalizeIPandSP and internalizeIPandSP big hints as to the
transitions between those main structures?

No :-(.  This is actually a milli-optimisation to the interpreter.  If one wants an interpreter to go fast one needs as many of the key interpreter variables (stack pointer, frame pointer, instruction pointer) in registers.  Compilers such as gcc allow global register variables (in part due to my work in BrouHaHa where I achieved this by nefarious means and then requested the facility of Richard Stallman). But that's non-portable.  The approach taken in the Squeak VM is to inline much of the interpreter into one function and have localSP, localFP & localIP as local variables, and rely on the C compiler's optimiser to put these in registers.  That means they have to be written to stackPointer, framePointer and instructionPointer before calling a primitive (externalize) and read back afterwards (internalize).  Good idea, adds complexity, doesn't add much to the Cog VM, essential to the Stack & Interpreter VMs.
 

  If so, could you spell out
which is which.  Also do stackPointer, framePointer,
instructionPointer, localSP, localFP, etc belong to certain of those
structures?
 
Yes.  You can see now that they belong to the interpreter and hence to the C runtime. They point to the current frame in the stack zone for the interpreter.  In machine code we use the native sp, fp & pc and so every trampoline does an externalize, a call, and an internalize (which may not be reached if the call doesn't return), and every enilipmart does an internalize, but of the native sp, fp & pc rather than localSP, localFP, etc.

Sorry about the font sizes.  Using the gmail app for the first time and it mucks things up when copy/pasting.


cheers -ben

> So...
> The essential structures are six-fold, three execution state structures, and three bodies of code, and in fact there is overlap of one of each.
>
> These are the execution state structures:
>
> 1. the C stack.
> 2. the Smalltalk stack zone.
> 3. the Smalltalk heap (which includes contexts that overflow the Smalltalk stack zone).
>
> These are the bodies of code:
> 4. the run-time, the code comprising the VM interpreter, JIT, garbage collector, and primitives
> 5. the jitted code living in the machine code zone, comprising methods, polymorphic in line caches, and the glue routines (trampolines and enilopmarts) between that machine code and the run-time
> 6. Smalltalk "source" code, the classes and methods in the Smalltalk heap that constitute the "program" under execution
>
> So 3. and 6. overlap; code is data, and 2. overflows into 3., the stack zone is a "cache", keeping the most recent activations in the most efficient form for execution.
> Further, 4. (the run-time) executes solely on 1. (the C stack), and 5. (the jitted code) runs only on 2. (the stack zone), and also, code in 6. executed (interpreted) by the interpreter and primitives in 4. runs on 2. (the stack zone)
>
> Your diagram names some of the surface transitions, but not the deeper when and why.  Here they are:
>
> a) execution begins on the C stack as the program is launched.  Once the heap is loaded, swizzling pointers as required, the interpreter is entered.  On first entry it
>   a1) allocates space for the stack zone on the C stack
>   a2) "marries" the context in the image that invoked the snapshot primitive (a stack frame in the stack zone is built for the context, and the context is changed to become a proxy for that stack frame).
>   a3) captures the top of the C stack (CStackPointer & CFramePointer) as interpret is about to be invoked, including creating a "landing pad" for jumping back into the interpreter
>     a3 vm) the landing pad is a jmpbuf created via setjmp, and jumped to via longjmp
>     a3 sim) the landing pad is an exception handler for the ReenterInterpreter notification
>   a4) calls interpret to start interpreting the method that executed the snapshot primitive
>
>
> Invoking the run-time:
> Machine code calls into the run-time for several facilities: adding an object to the remembered table if a store check indicates this must happen, running a primitive in the run-time, entering the run-time to lookup and bind a machine code send, or a linked send that has missed.  To invoke the run-time, the machine code saves the native stack and frame pointers (those of the current Smalltalk stack frame) in stackPointer and framePointer, sets the native stack and frame pointers to CStackPointer and CFramePointer, passes parameters (pushing x86, loading registers ARM, x64) and calls the run-time routine.  Simple routines (adding element to the remembered set) simply perform the operation and return. The code returned to then switches back to the Smalltalk stack pointers and continues.  Routines that change the Smalltalk frame (send-linking routines, complex primitives such as perform:) reenter via an enilopmart.
>
>
> Transition to the interpreter:
> So any time the machine code wants to transition to the interpreter (not simply call a routine in the run-time, but to interpret an as-yet-unjitted/unjittable method, either via send or return, the machine code switches the frame and stack pointers to those captured in a3) and longjmps (raises the ReenterInterpreter exception).  It does this by calling a run-time routine (as in "Invoking the run-time") that actually performs the longjmp.  Any intervening state on the C stack will be discarded, and execution will be in the same state as when the interpret routine was entered immediately after initialising the stack zone.
>
>
> N.B. Note that if the interpreter merely called the machine-code, and the machine-code merely called the run-time, instead of substituting the stack and frame pointers with CStackPointer and CFramePointer set up on initial invocation of interpret, then the C stack would grow on each transition between machine code execution and interpreter/run-time execution and the C stack would soon overflow.
>
>
> Call-backs:
>
> The C stack /can/ grow however.  If a call-out calls back then the call-back executes lower down the C stack.  A call out will have been made from some primitive invoked either from the interpreter or machine-code, and that primitive will run on the C stack.  On calling back, the VM saves the current CStackPointer, CFramePointer and "landing-pad" jmpbuf in state associated with the call-back, and then reenters the interpreter, saving new values for the CStackPointer, CFramePointer and "landing-pad" jmpbuf.  Execution now continues in this new part of the C stack below the original.  On the call-back returning (again via a primitive), the CStackPointer, CFramePointer and "landing-pad" jmpbuf are restored before returning to the C code that invoked the call-back.  Once this C code returns, the stack is unwound back to the state before the call-out was invoked.
>
>
> Transition to machine-code:
> The interpreter uses the simple policy of jitting a method if it is found in the first-level method lookup cache, effectively hitting methods that are used more than once.  If the jitter method contains a primitive, that primitive routine will be invoked just as if it were an interpreted method.  If the method doesn't have a primitive, the interpreter will jump into machine code immediately.  t jumps into machine code by pushing any parameters (the state of the machine code registers, such as ReceiverResultReg, and the machine code address to begin execution) onto the top of the Smalltalk stack, and calling an enilopmart that switches from the C to the Smalltalk stack, loads the registers and jumps to the machine code address via a return instruction that pops the entry point off the Smalltalk stack.
>
>
> Simulating these transitions in the Simulator:
> In the Simulator, the C run-time (4.) are Smalltalk objects, and 1., 2., 3., 5., & 6. live in the memory inst var of the object memory, a large ByteArray.  The machine code lives in the bottom of this memory byte array (MBA), and has no direct access to the Smalltalk objects.  In the real VM, the correlates of these objects all exist at specific addresses and may be accessed directly from machine code.  In the simulator this is not possible.  Instead, these objects are all assigned out-of-bounds addresses, and a dictionary maps from the specific out-of-bounds address to the specific object being accessed, e.g. stackPointer, an inst var of InterpreterPrimitives, the superclass of StackInterpreter, has an address in simulatedAddresses that maps to a block that does a perform to access stackPointer's value.  See CoInterpreter>>stackPointerAddress.
>
> Machine code is executed by one of the processor aliens via the primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: primitives. These primitives will fail when they encounter an illegal instruction, including an instruction that tried to fetch or store or jump to an out-of-bounds address.  The primitive failure code analyses the instruction that failed and (when appropriate, the instruction may actually be illegal, the result of some bug in the system, but is typically an intended access of some run-time object) creates an instance of the ProcessorSimulationTrap exception and raises it.  The handler then handles the exception to either fetch, store or invoke Smalltalk objects in the simulation, and once handled execution can continue.
>
> Hence in the simulator primitiveRunInMemory:minimumAddress:readOnlyBelow: or primitiveSingleStepInMemory:minimumAddress:readOnlyBelow: (actually their wrappers singleStepIn:minimumAddress:readOnlyBelow: & runInMemory:minimumAddress:readOnlyBelow: are always invoked in the context of Cogit>>simulateCogCodeAt:, which provides the handler, and tests for machine code break-points using the breakBlock.
>
> In the same way that the VM must avoid C stack growth when transitioning between machine code and the interpreter/run-time above, so the simulator must avoid uncontrolled stack growth when the simulated machine code invokes Smalltalk code which again invokes simulated machine code.  So the code that invokes the run-time from simulateCogCodeAt: (Cogit>>#handleCallOrJumpSimulationTrap:) includes a handler for the ReenterMachineCode notification.  Whenever the Smalltalk run-time wants to reenter machine code via an enilopmart it sends Cogit>>#simulateEnilopmart:numArgs: which raises the notification before sending Cogit>>simulateCogCodeAt:.  So the first entry into machine code via an enilopmart starts Cogit>>simulateCogCodeAt:, but subsequent ones end up returning to that first Cogit>>simulateCogCodeAt: to continue execution.
>
>
> Ben, given the above, can you now see how your yellow boxes name specific transitions amongst the structures explained below?  I hope I've encouraged you, not discouraged you, to revise and bifurcate your diagram into two state transition diagrams for the real and simulated VM.  It would be great to have really good diagrammatic representations of the above.
>
> And once we have that, we can build the relatively simple extension that allows the interpreter and machine code to interleave interpreted and machine code frames on the Smalltalk stack (2.) that allow the VM to freely switch between interpreted and jitter code, and to fall back on the interpreter whenever convenient.
>
> On Mon, Jun 13, 2016 at 6:24 AM, Ben Coman <<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;btc@openinworld.com&#39;)">btc@...> wrote:
>>
>>
>> In trying to understand the flow of execution (and in particular the
>> jumps in the jitted VM, I made a first rough pass to map it in the
>> attached chart.
>>
>> I am trying to colourize it to distinguish between paths that can
>> return to the interpreter, those that circulate in jitted code, and
>> the transitions.  I'm sure I've missed the mark a bit but its a start.
>> Of course corrections welcome, even scanned pen sketches.
>>
>> cheer -ben
>>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>


--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

Eliot Miranda-2
In reply to this post by Ben Coman
 


On Tuesday, June 14, 2016, Ben Coman <[hidden email]> wrote:

btw, what does the "ce" stand for in all the  methods...
CI>> ceDynamicSuperSend:receiver:
CI>> ceImplictRecevierSend:receiver:
CI>> ceOuterSend:receiver:
CI>> ceSelfSend:

Cog entry.  A weak term, but  groups all the code that machine code calls together.


cheers -ben


--
_,,,^..^,,,_
best, Eliot

Reply | Threaded
Open this post in threaded view
|

Re: CogVM Execution Flow

timrowledge
In reply to this post by Levente Uzonyi

Well overflowing a CPIC to an openPIC is rare enough that it probably doesn’t matter much anyway.

> On 14-06-2016, at 4:26 AM, Levente Uzonyi <[hidden email]> wrote:
>
> That would make sense if the six entries were the among the most commonly used ones.

That would be a good trick to manage! Clement might be able to provide some input from Sista to help us get closer I guess? But the important bit is that whilst they may very well not be the most commonly used methods, they are the most recently called ones, which means they have some importance in the current context. Making use of them isn’t likely to be a bad idea so far as I can see.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Computer Science: solving today's problems tomorrow.


12