Advice Required - Live Typing - Saving local vars types

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice Required - Live Typing - Saving local vars types

Hernan Wilkinson-3
 
Hi all!
 I need some advice to solve a problem for the Live Typing functionality I'm working on (Live Typing, previously called Dynamic Type Information, saves the class of the object every time it is assigned to a variable, used as return/etc. That info is accesible from the image and helps having better tools. More info at: https://github.com/hernanwilkinson/Cuis-Smalltalk-DynamicTypeInformation
For the shake of simplicity I use class and type indistinctly)

 The problem is related to saving the type of parameters and temporaries of closures (BlockClosure). Parameters and temporaries (locals from now on) of methods work well (with some minor issues not important now) 
 To save classes of method locals I added an array in AdditionalMethodState whose size is equals to locals size. Each element of that array points to another array that holds the classes of the objects assigned to a variable (the local var index is used as index in the first array).
 Saving the types of closure's locals is not that simple, but I solved the "structural" part  adding a new indirection (as usual). So now AdditionalMethodState has an array whose size equals "1 + the number of closures the method has", that is one element per "closure". Each element of that array will point the array used to point the arrays of types per variable. I think an example will help:

m1: p1
   | t1 |

   t1 := 0.   "<-- it will save SmallInteger in (method additionalState contextTypesAt: 1) at: 2."
   [ | t2 | t2 := 'hello' ] value. "<-- it will save String in (method additionalState contextTypesAt: 2) at: 1"

   [ | t3 | t3 := 3.14 ] value. "<-- it will save Float in (method additionalState contextTypesAt: 3) at: 1"

As we can see, index 1 is used for the method's context when 0 is assigned to t1. Because it is the second local (the first one is p1), the index 2 is used to save the type of 0 (SmallInteger). 
 Index 2 is used when 'hello' is assigned to t2 because t2 is defined in the first closure. Because it is the first block local, index 1 is used to access t2 types array, and so on.

 I did not mentioned it, but the problem resides in the fact that the same bytecode is used to assign an object to a var, no matter if it is inside a closure or not.
 So the problem I have to solve is how can the bytecode's code know at witch context types array save the assigned object's class. That is, for the same bytecode, for example "<69> popIntoTemp: 1" I have to decide the array to use.
 I hope I've been clear, it is a difficult to explain...

 After a lot of ideas and possibilities, I found that the method/closure start pc could be use to decide the array's index to use (based on something similar to what CompiledMethod>>#startpcsToBlockExtents returns). 
 So, every time a new activation context is created (for example StackInterpreter>>#activateNewClosure:outer:method:numArgs:mayContextSwitch:, StackInterpreter>>#internalActivateNewMethod and so on) I can use the start pc to calculate the array's index.
 So, let's say I have that solved too, now the problem is how can I access that index from the bytecode's code?. I have the following ideas:
1) Add an inst. var. to MethodContext that will have the index (or even better, the local's types array). So every time a new context is created, I calculate the index based on the PC and set that inst. var. 
2) Do the same as in 1) but adding an inst. var. to BlockClosure (better than 1 because the closure is created once while the method context could be created more than once for the same closure)
3) Push the calculated index in the stack (as the IP, SP, etc. are pushed). Based on the num. args + num. temps., calculate the position in the stack of that index every time a type has to be saved.
4) Have an interpreter variable as 'method' but called, let's say, 'contextVarsTypes' that is set every time a new activation is created. The previous contextVarsTypes value is pushed in the stack and restore from it when exiting a context.

The problem with 1) and 2) is that MethodContext and BlockClosure can not be modified (at least not easily, a new image format would be needed, etc), but the advantage is that I don't have to worry about that value when a context is leaved. 
Between 3) and 4) and think 4) is faster but I'm not sure that if a GC is executed and the array moved (let's say contextVarsTypes points directly to the types array), that contextVarsTypes will point to the new arrays position in memory (will that happen? how is 'method' changed if a GC is executed?)

Which one do you think is better/faster/possible?
Any advice/comment on this matter will be appreciated. 
If there is an easier/different way to solve this problem, please help me :-)

Thanks!
Hernan 


--
Hernán Wilkinson
Agile Software Development, Teaching & Coaching
Phone: +54-011-4893-2057
Twitter: @HernanWilkinson
Address: Alem 896, Floor 6, Buenos Aires, Argentina
Reply | Threaded
Open this post in threaded view
|

Re: Advice Required - Live Typing - Saving local vars types

Eliot Miranda-2
 
Hi Hernan,

On Tue, Jan 1, 2019 at 6:30 PM Hernan Wilkinson <[hidden email]> wrote:
 
Hi all!
 I need some advice to solve a problem for the Live Typing functionality I'm working on (Live Typing, previously called Dynamic Type Information, saves the class of the object every time it is assigned to a variable, used as return/etc. That info is accesible from the image and helps having better tools. More info at: https://github.com/hernanwilkinson/Cuis-Smalltalk-DynamicTypeInformation
For the shake of simplicity I use class and type indistinctly)

 The problem is related to saving the type of parameters and temporaries of closures (BlockClosure). Parameters and temporaries (locals from now on) of methods work well (with some minor issues not important now) 

Hence the solution top your problem is to use FullBlocks.  The SistaV1 bytecode set supports full blocks, which means it is possible to use a separate CompiledBlock for every block in a method. In Squeak one has to set the Preferred bytecode set encoder class preference to EncoderForSistaV1 and recompile the system (the system supports up to two bytecode sets).

The support is in Squeak and fully tested there (my work image uses the SistaV1 bytyecode set).  Some of the support is in Pharo 7, but there is no preference and so no easy way to force the use of SistaV1 and Full Blocks.  I don't know if the support has been ported to Cuis from Squeak, but it should not be difficult to do.  
 
 To save classes of method locals I added an array in AdditionalMethodState whose size is equals to locals size. Each element of that array points to another array that holds the classes of the objects assigned to a variable (the local var index is used as index in the first array).
 Saving the types of closure's locals is not that simple, but I solved the "structural" part  adding a new indirection (as usual). So now AdditionalMethodState has an array whose size equals "1 + the number of closures the method has", that is one element per "closure". Each element of that array will point the array used to point the arrays of types per variable. I think an example will help:

m1: p1
   | t1 |

   t1 := 0.   "<-- it will save SmallInteger in (method additionalState contextTypesAt: 1) at: 2."
   [ | t2 | t2 := 'hello' ] value. "<-- it will save String in (method additionalState contextTypesAt: 2) at: 1"

   [ | t3 | t3 := 3.14 ] value. "<-- it will save Float in (method additionalState contextTypesAt: 3) at: 1"

As we can see, index 1 is used for the method's context when 0 is assigned to t1. Because it is the second local (the first one is p1), the index 2 is used to save the type of 0 (SmallInteger). 
 Index 2 is used when 'hello' is assigned to t2 because t2 is defined in the first closure. Because it is the first block local, index 1 is used to access t2 types array, and so on.

 I did not mentioned it, but the problem resides in the fact that the same bytecode is used to assign an object to a var, no matter if it is inside a closure or not.
 So the problem I have to solve is how can the bytecode's code know at witch context types array save the assigned object's class. That is, for the same bytecode, for example "<69> popIntoTemp: 1" I have to decide the array to use.
 I hope I've been clear, it is a difficult to explain...

Well it is difficult to explain.  But a key insight is that the Decompiler solves exactly this problem in mapping from bytecodes back to source, and that the Debugger solves this problem in displaying the temporary variables that are in scope.  So if you have a look at the code in DebuggerMethodMap you should find an API that provides what you want.  For example, these Context methods all use the relevant API for linearizing temps:

Context methods for debugger access
namedTempAt: index
"Answer the value of the temp at index in the receiver's sequence of tempNames."
^self debuggerMap namedTempAt: index in: self

namedTempAt: index put: aValue
"Set the value of the temp at index in the receiver's sequence of tempNames.
(Note that if the value is a copied value it is also set out along the lexical chain,
 but alas not in along the lexical chain.)."
^self debuggerMap namedTempAt: index put: aValue in: self

tempNames
"Answer a SequenceableCollection of the names of the receiver's temporary 
variables, which are strings."

^ self debuggerMap tempNamesForContext: self

tempsAndValues
"Return a string of the temporary variables and their current values"
^self debuggerMap tempsAndValuesForContext: self

So for example if I evaluate this:

    (1 to: 2) inject: 0 into: [:a :b| ^thisContext sender tempsAndValues]

I get this:

'thisValue: 0
binaryBlock: [closure] in Context class>>DoIt
nextValue: 0
each: 1
'

So the simplified API is provided by DebuggerMethodMap.

But you want to do this in the VM.  That's a lot harder, and it makes the VM implementation very dependent on the specific compiler implementation.  If you go with Full Blocks things may not be too bad.  But moving things up to the image level will make everything a lot easier, faster to develop, and extensible.

 After a lot of ideas and possibilities, I found that the method/closure start pc could be use to decide the array's index to use (based on something similar to what CompiledMethod>>#startpcsToBlockExtents returns). 
 So, every time a new activation context is created (for example StackInterpreter>>#activateNewClosure:outer:method:numArgs:mayContextSwitch:, StackInterpreter>>#internalActivateNewMethod and so on) I can use the start pc to calculate the array's index.
 So, let's say I have that solved too, now the problem is how can I access that index from the bytecode's code?. I have the following ideas:
1) Add an inst. var. to MethodContext that will have the index (or even better, the local's types array). So every time a new context is created, I calculate the index based on the PC and set that inst. var. 
2) Do the same as in 1) but adding an inst. var. to BlockClosure (better than 1 because the closure is created once while the method context could be created more than once for the same closure)
3) Push the calculated index in the stack (as the IP, SP, etc. are pushed). Based on the num. args + num. temps., calculate the position in the stack of that index every time a type has to be saved.
4) Have an interpreter variable as 'method' but called, let's say, 'contextVarsTypes' that is set every time a new activation is created. The previous contextVarsTypes value is pushed in the stack and restore from it when exiting a context.

The problem with 1) and 2) is that MethodContext and BlockClosure can not be modified (at least not easily, a new image format would be needed, etc), but the advantage is that I don't have to worry about that value when a context is leaved. 
Between 3) and 4) and think 4) is faster but I'm not sure that if a GC is executed and the array moved (let's say contextVarsTypes points directly to the types array), that contextVarsTypes will point to the new arrays position in memory (will that happen? how is 'method' changed if a GC is executed?)

Which one do you think is better/faster/possible?
Any advice/comment on this matter will be appreciated. 
If there is an easier/different way to solve this problem, please help me :-)

Thanks!
Hernan 


--
Hernán Wilkinson
Agile Software Development, Teaching & Coaching
Phone: +54-011-4893-2057
Twitter: @HernanWilkinson
Address: Alem 896, Floor 6, Buenos Aires, Argentina


--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Advice Required - Live Typing - Saving local vars types

Hernan Wilkinson-3
 
Hi Eliot!


Hence the solution top your problem is to use FullBlocks.  The SistaV1 bytecode set supports full blocks, which means it is possible to use a separate CompiledBlock for every block in a method. In Squeak one has to set the Preferred bytecode set encoder class preference to EncoderForSistaV1 and recompile the system (the system supports up to two bytecode sets).

What is FullBlocks? could you comment on that or point to some reading?
In the meantime I'll take a look at the code.
Also, could you send me a link to read about Sista? I don't remember the details of it.


The support is in Squeak and fully tested there (my work image uses the SistaV1 bytyecode set).  Some of the support is in Pharo 7, but there is no preference and so no easy way to force the use of SistaV1 and Full Blocks.  I don't know if the support has been ported to Cuis from Squeak, but it should not be difficult to do.  

Not it has not... I'll ask Juan what his plan is about it.
 
 

Well it is difficult to explain.  But a key insight is that the Decompiler solves exactly this problem in mapping from bytecodes back to source, and that the Debugger solves this problem in displaying the temporary variables that are in scope.  So if you have a look at the code in DebuggerMethodMap you should find an API that provides what you want. 

yes yes, I have read it and that gives me all I need from the image point of view, but the problem is in the VM
 
For example, these Context methods all use the relevant API for linearizing temps:

Context methods for debugger access
namedTempAt: index
"Answer the value of the temp at index in the receiver's sequence of tempNames."
^self debuggerMap namedTempAt: index in: self

namedTempAt: index put: aValue
"Set the value of the temp at index in the receiver's sequence of tempNames.
(Note that if the value is a copied value it is also set out along the lexical chain,
 but alas not in along the lexical chain.)."
^self debuggerMap namedTempAt: index put: aValue in: self

tempNames
"Answer a SequenceableCollection of the names of the receiver's temporary 
variables, which are strings."

^ self debuggerMap tempNamesForContext: self

tempsAndValues
"Return a string of the temporary variables and their current values"
^self debuggerMap tempsAndValuesForContext: self

So for example if I evaluate this:

    (1 to: 2) inject: 0 into: [:a :b| ^thisContext sender tempsAndValues]

I get this:

'thisValue: 0
binaryBlock: [closure] in Context class>>DoIt
nextValue: 0
each: 1
'

So the simplified API is provided by DebuggerMethodMap.

But you want to do this in the VM.  That's a lot harder, and it makes the VM implementation very dependent on the specific compiler implementation.  If you go with Full Blocks things may not be too bad.  But moving things up to the image level will make everything a lot easier, faster to develop, and extensible.

I try to do the least possible in the VM, in fact the changes I made are very simple and independent of the compiler/etc. Doing it on the image side (using meta-links for example in Pharo) is too slow and not really usable. What I did is usable and simple, I'm working with that VM/image all the time and works great.
I'll see what you suggest of FullBlocks and see how it can help... in the meantime I think I'll do the option 4) explained bellow. 
I'll keep you posted

Thanks!
Hernan.


 After a lot of ideas and possibilities, I found that the method/closure start pc could be use to decide the array's index to use (based on something similar to what CompiledMethod>>#startpcsToBlockExtents returns). 
 So, every time a new activation context is created (for example StackInterpreter>>#activateNewClosure:outer:method:numArgs:mayContextSwitch:, StackInterpreter>>#internalActivateNewMethod and so on) I can use the start pc to calculate the array's index.
 So, let's say I have that solved too, now the problem is how can I access that index from the bytecode's code?. I have the following ideas:
1) Add an inst. var. to MethodContext that will have the index (or even better, the local's types array). So every time a new context is created, I calculate the index based on the PC and set that inst. var. 
2) Do the same as in 1) but adding an inst. var. to BlockClosure (better than 1 because the closure is created once while the method context could be created more than once for the same closure)
3) Push the calculated index in the stack (as the IP, SP, etc. are pushed). Based on the num. args + num. temps., calculate the position in the stack of that index every time a type has to be saved.
4) Have an interpreter variable as 'method' but called, let's say, 'contextVarsTypes' that is set every time a new activation is created. The previous contextVarsTypes value is pushed in the stack and restore from it when exiting a context.

The problem with 1) and 2) is that MethodContext and BlockClosure can not be modified (at least not easily, a new image format would be needed, etc), but the advantage is that I don't have to worry about that value when a context is leaved. 
Between 3) and 4) and think 4) is faster but I'm not sure that if a GC is executed and the array moved (let's say contextVarsTypes points directly to the types array), that contextVarsTypes will point to the new arrays position in memory (will that happen? how is 'method' changed if a GC is executed?)

Which one do you think is better/faster/possible?
Any advice/comment on this matter will be appreciated. 
If there is an easier/different way to solve this problem, please help me :-)

Thanks!
Hernan 


--
Hernán Wilkinson
Agile Software Development, Teaching & Coaching
Phone: +54-011-4893-2057
Twitter: @HernanWilkinson
Address: Alem 896, Floor 6, Buenos Aires, Argentina


--
_,,,^..^,,,_
best, Eliot


--
Hernán Wilkinson
Agile Software Development, Teaching & Coaching
Phone: +54-011-4893-2057
Twitter: @HernanWilkinson
Address: Alem 896, Floor 6, Buenos Aires, Argentina