Questions about Cog internals

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Questions about Cog internals

Mariano Martinez Peck
 
Hi Eliot. I am really trying (with all my lack of knowledge) to understand a little about how Cog works internally. I am also reading your posts, and I have a couple of (probably newbie) questions. If any of them are answered in the blog, please point me to them (I couldn't ream all of them yet):

1) Suppose you have a CompiledMethod XXX that you JIT, and you get a CogMethod YYY. While doing the GC (#lastPointersOf:, #lastPointerWhileForwarding:, etc), you need to check whether XXX is a CogMethodReference because if so, you need to fetch XXX header from YYY. Perfect. To avoid the GC to look in the CogMethod "objects" you put a header with the special format for empty objects, hence the GC doesn't follow the non-existent "instVars" of CogMethod. Perfect. CogMethod has a pointer to its original CM (back-pointer), called 'methodObject'. In this case, YYY has a pointer to XXX. So....my question is, during a GC compaction or a #become, where the address of XXX is changed, how do you update YYY so that to point to the new address of XXX?   because if you flag YYY as an empy object, then the GC doesn't update it.

2) As far as I understand, CogMethod doesn't "store/duplicate" the literals of the CompiledMethod. Hence, even when you have a jitted method, when you need a special literal, you ask it to the CM, using the backPointer 'methodObject'. Is this correct ?

3) This is the most stupid question, but I don't see WHERE the machine code is kept. When I jit a method, I get a structure CogMethod, perfect. What where is the generated machine code? where is it kept? how can I know from a CogMethod which is the associated machine code?

4) I guess that my thought of 2) is not correct, because otherwise, I don't understand why you need CoInterpreter >>markAndTraceOrFreeMachineCode:. The comments says "Deal with a fulGC's effects on machine code.  Either mark and trace oops in machine code or free machine-code methds that refer to freed  oops.  The stack pages have already been traced so any method  of live stack activations have already been marked and traced."

which oops do you mean by "oops in machine code" ? literals? the back-poiner to the CM?  
 
and by " free machine-code methods that refer to freed  oops"  what do you mean?  literals or oops as the back pointer?  I can think you refer to the backpointer since the original CM could have been garbage collected and since you flag the CogMethod as empty...

5) This is not a question, but rather that I would like to know whether I understood correctly or not. You Jit a method when it is secondly used, that is, when you find it in the cache. To know how to generate the machine code or a particular bytecode, you check in the table that you generate wth #initializeBytecodeTableForClosureV3 where you basically map bytecodes to methods that generates the machine code of such bytecode. If it is a primitive you use instead #compilePrimitive which cecks in a similar table, but for primitives, which is set in #initializePrimitiveTableForSqueakV3.

Now, I have compiled method XXX (selector xxx) which sends #foo. XXX was jitted to CogMethod YYY (selector yyy).   When xxx is executed, YYY is executed. When YYY was jitted, you defined in #initializeBytecodeTableForClosureV3  that it just be a specific method, which at the end, for normal messages it is:  #genSend:numArgs:. That method to generate the machine code includes the "trampoline" (which is searched in 'sendTrampolines', and in #generateSendTrampolines we can see how you map from one to the other one) and sends the associated message, in this case, #ceSend:super:to:numArgs:. So...the #foo will be finally "handle" in ceSend:super:to:numArgs:.  This is ONLY true if the send was "unlinked".  If #foo in fact was jitted also, then you try to link it (to avoid searching in cache next times???). Suppose you could link both of them,so next time YYY is executed, it will call DIRECTLY the CogMethod of #foo. In this case, the method to be executed in the VM is #executeCogMethodFromLinkedSend:withReceiver:   instead of  #ceSend:super:to:numArgs:

So..I am delirious or that is more or less correct ?

Thanks a lot in advance,

--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: Questions about Cog internals

Eliot Miranda-2
 


On Tue, May 3, 2011 at 4:49 AM, Mariano Martinez Peck <[hidden email]> wrote:
 
Hi Eliot. I am really trying (with all my lack of knowledge) to understand a little about how Cog works internally. I am also reading your posts, and I have a couple of (probably newbie) questions. If any of them are answered in the blog, please point me to them (I couldn't ream all of them yet):

1) Suppose you have a CompiledMethod XXX that you JIT, and you get a CogMethod YYY. While doing the GC (#lastPointersOf:, #lastPointerWhileForwarding:, etc), you need to check whether XXX is a CogMethodReference because if so, you need to fetch XXX header from YYY. Perfect. To avoid the GC to look in the CogMethod "objects" you put a header with the special format for empty objects, hence the GC doesn't follow the non-existent "instVars" of CogMethod. Perfect. CogMethod has a pointer to its original CM (back-pointer), called 'methodObject'. In this case, YYY has a pointer to XXX. So....my question is, during a GC compaction or a #become, where the address of XXX is changed, how do you update YYY so that to point to the new address of XXX?   because if you flag YYY as an empy object, then the GC doesn't update it.

The garbage collector uses NewObjectMemory>>#mapPointersInObjectsFrom:to: to update pointers for compactions and becomes.  This always invokes CoInterpreter>>mapInterpreterOops which always invokes CoInterpreter>>mapMachineCode, which always invokes Cogit>>mapObjectReferencesInMachineCode:.  That splits into either Cogit>>mapObjectReferencesInMachineCodeForFullGC or Cogit>>mapObjectReferencesInMachineCodeForIncrementalGC, depending on this being an incremental GC or not.  The CogMethodZone maintains a list of Cog methods containing young references so in an incremental GC only these methods are scanned.


2) As far as I understand, CogMethod doesn't "store/duplicate" the literals of the CompiledMethod. Hence, even when you have a jitted method, when you need a special literal, you ask it to the CM, using the backPointer 'methodObject'. Is this correct ?

That's not correct.  Literals are embedded in machine code, both in inline caches (selectors and classes) and in literal references.  See Cogit>>annotate:objRef:. 

3) This is the most stupid question, but I don't see WHERE the machine code is kept. When I jit a method, I get a structure CogMethod, perfect. What where is the generated machine code? where is it kept? how can I know from a CogMethod which is the associated machine code?

Look at CoInterpreter>>readImageFromFile:HeapSize:StartingAt: (for the real VM) and CogVMSimulator>>openOn:extraMemory: (for the simulator).  These set-up the memory via the variable memory (in the real VM) or 0 (in the simulator the heap starts at address 0), and cogCodeSize.  Then see Cogit>>initializeCodeZoneFrom:upTo: for initialization.  The CogMethodZone is at the start of the heap.

4) I guess that my thought of 2) is not correct, because otherwise, I don't understand why you need CoInterpreter >>markAndTraceOrFreeMachineCode:. The comments says "Deal with a fulGC's effects on machine code.  Either mark and trace oops in machine code or free machine-code methds that refer to freed  oops.  The stack pages have already been traced so any method  of live stack activations have already been marked and traced."

which oops do you mean by "oops in machine code" ? literals? the back-poiner to the CM?

Both, and oops in inline caches.
 
and by " free machine-code methods that refer to freed  oops"  what do you mean?  literals or oops as the back pointer?  I can think you refer to the backpointer since the original CM could have been garbage collected and since you flag the CogMethod as empty...

This is the tracing step that marks live objects.  It must identify all object references in a Cog method.  But if the Cog method's bytecoded method isn't marked it frees the Cog method.  See Cogit>>markAndTraceOrFreeCogMethod:firstVisit:.


5) This is not a question, but rather that I would like to know whether I understood correctly or not. You Jit a method when it is secondly used, that is, when you find it in the cache. To know how to generate the machine code or a particular bytecode, you check in the table that you generate wth #initializeBytecodeTableForClosureV3 where you basically map bytecodes to methods that generates the machine code of such bytecode. If it is a primitive you use instead #compilePrimitive which cecks in a similar table, but for primitives, which is set in #initializePrimitiveTableForSqueakV3.

Methods are jitted either when found in the cache, or when a block is invoked in the same method twice in a row (on the second block invocation) or on the Nth backward jump in a loop or when a method is evaluated via withArgs:executeMethod: (a doit).  Look for transitive senders of Cogit>>cog:selector:.


Now, I have compiled method XXX (selector xxx) which sends #foo. XXX was jitted to CogMethod YYY (selector yyy).   When xxx is executed, YYY is executed. When YYY was jitted, you defined in #initializeBytecodeTableForClosureV3  that it just be a specific method, which at the end, for normal messages it is:  #genSend:numArgs:. That method to generate the machine code includes the "trampoline" (which is searched in 'sendTrampolines', and in #generateSendTrampolines we can see how you map from one to the other one) and sends the associated message, in this case, #ceSend:super:to:numArgs:. So...the #foo will be finally "handle" in ceSend:super:to:numArgs:.  This is ONLY true if the send was "unlinked".  If #foo in fact was jitted also, then you try to link it (to avoid searching in cache next times???). Suppose you could link both of them,so next time YYY is executed, it will call DIRECTLY the CogMethod of #foo. In this case, the method to be executed in the VM is #executeCogMethodFromLinkedSend:withReceiver:   instead of  #ceSend:super:to:numArgs:

So..I am delirious or that is more or less correct ?

More or less.  Yes.  Have you read http://www.mirandabanda.org/cogblog/2011/03/01/build-me-a-jit-as-fast-as-you-can/?  It covers ceSend:... in detail.
 

Thanks a lot in advance,

you're welcome.
 

--
Mariano
http://marianopeck.wordpress.com