Cross-posted to:
[hidden email] [hidden email] [hidden email] [hidden email] comp.lang.smalltalk Hello lists. This is a draft of an article I am writing for an upcoming series at the Weekly Squeak. It is about optimizing context creation, I would like comments on its technical correctness, as well as suggestions for related content I should include. I would especially like to know if this article covers the way this is handled in systems I did not specifically research, such as Strongtalk, and Exupery, and non-smalltalk systems like Common Lisp, Erlang, Haskell, and OCaml, since this topic is not really specific to object-oriented systems. This will not be the first article of the series; however, the first two paragraphs are the preliminary series introduction. I put them in so that you can get an idea of what I am writing. Sections enclosed in <i> ... </i> tags are parts I am especially unsure of; I would appreciate comments (I used i tags, because my editor, vim, highlights the enclosed text when in HTML mode). Look forward to this appearing in the Weekly Squeak: http://news.squeak.org/ Thank you for your help, and enjoy the article! === Speeding Up Smalltalk Series, part 42: Context Optimizations === Ever since object-oriented systems became popular, they have had the image of being very easy to program, but very slow in terms of execution speed. This image and reality has plagued every popular object-oriented system, starting with Smalltalk in the 80's and continuing on to Java and Ruby today. However, this does not have to be the case. Early-bound languages have gotten much faster thanks to lots of research on compilers and computer architecture. Similarly, much research has been done on making late-bound object-oriented systems fast, both with software and with hardware. This research has not seen as much exposure as processor research, and so is probably less familiar to you. This series, "Speeding Up Smalltalk" attempts to summarize a lot of that research and present it for your enjoyment. Computation in Smalltalk and Self is done by sending messages to objects. This seemingly simple operation of sending a message is done so often and is so flexible that it is <b> always </b> the execution bottleneck in any implementation of Smalltalk or similar object-oriented system. Why is that? ---- Description of Context Objects ---- Every time a message is sent, four things happen: - A search is done through the message receiver for a method that has been sent. This was the topic of the previous article, and will not be discussed here. - A context object is allocated on the heap. - The context object is initialized and filled with: - local objects, including: - pointers to the message arguments - pointers for each local variable - pointers for temporary storage needed for intermediate results - A way to reference variables defined in other scopes. This is a property blocks have, but methods don't - In Smalltalk, by a pointer to the "home context" - In Self, by delegate slots to the enclosing context(s) - A return address at which to resume execution - Squeak does not need this, since every context has it's own PC, so when returning, the PC of the returned-to context is already in the right place. This is equivalent to a return address, since a child's return address is the parents PC. - A return context that will become current upon return - The newly-created context is made the "current" context for this thread/process In Squeak, MethodContexts and BlockContexts are subclasses of ContextPart. ContextPart has three instance variables: - sender: The context that sent the message being executed. This context will become current when the current context returns - pc: A pointer to the currently executing instruction - stackp: The pointer to the execution stack. The smalltalk virtual machine is a stack processor. local variables are stored in the indexed fields of the context. stackp keeps track of which OOP is at the top of the stack. For MethodContexts: MethodContext extends it with 3 instance variables, and a variable-sized area for holding the temporaries: - method: the CompiledMethod holding the code being executed - receiverMap: <i> Something to do with the names of the variables. Not really sure </i> - receiver: a pointer to "self", the receiver of the message. ---- Squeak optimization: Static Contexts ---- In Squeak, BlockContexts are partial closures (sometimes called static blocks/closures [3]). Rather than hold their own code and environment, BlockContexts piggyback off of another context/code pair. Home is the environment that the block piggybacks off of. This usually works, but there are some cases where this scheme will not work: - <i> like what? Why not? I don't understand why this doesn't always work. Something to do with re-entrancy or circular references. Not sure why either is a problem. </i> For BlockContexts: - nargs: the number of arguments that the block must be called with. - startpc: The index into <code>self home method</code> where the code for the block starts. - home: the home environment Squeak's NewCompiler has a different representation of blocks. These are full closures, storing their own code and environment. They are sometimes called dynamic closures to distinguish them from the above "static closures" [3]. They contain: - method: The code of the block (CompiledMethod2) - environment: Captured temps (ClosureEnvironment) ---- Squeak optimization: Context Recycling ---- To avoid having to create a new MethodContext on every message send, Squeak keeps a pool of pre-allocated MethodContexts, and simply fills one up when a new one is needed. To support this optimization, MethodContexts are only created with two sizes, rather than any size that they could support. The small size has room for <i> a few </i> temporaries, and the large size has room for <i> a bunch </i> [4] ---- VisualWorks optimization: Context Caches ---- The most natural way to store contexts is as a linked list of objects, but that does not map well to the hardware of many processors. Many processors (including all the Intel varieties) have <i> several </i> built-in instructions to mechanize the creation of stack frames, which are similar to context objects, but have several important differences. VisualWorks uses these native stack frames for most operations, meaning that context creation is identical to a C function send in many cases. Thus, the hardware stack acts as a stand-in for the context objects; this optimization is called a context cache. However, stack frames cannot handle context responsibilities beyond a single call and return; when more is needed, a context object must be allocated on the heap. The context object can either be a context in it's own right (a stable context), or act as a proxy to the hardware stack frame (hybrid context). There are several complications to this dual scheme; curious readers should refer to [5]. VW and Squeak does not use a stack of context frames; instead it uses a linked list of real context objects. This is easier, but perhaps slower. To save on allocation costs, contexts are cached and recycled for various method invocations. <i> Not sure where to incorporate this </i> There are three types of blocks: - open/clean blocks (pure function with no closure) - copying blocks - full blocks ---- Or you can cheat... ---- The most popular object-oriented languages (C++, Java) chose to dump the flexibility of blocks in exchange for runtime speed. These languages use a native stack frame without providing the escape mechanism of a context cache, so that blocks and other context behavior that does not follow stack principles is not supported. This is a tradeoff that eliminates a good deal of flexibility, and so will not be explored further. ---- Final words ---- Inlining solves many of the same problems as a Context cache does. If the method body is inlined into the caller, no new context need be created. However, inlining solves many more problems than just context elimination; I will dedicate an entire article to it. ---- References ---- - context recycling may be discussed in tims nublue book: http://www.rowledge.org/tim/squeak/OE-Tour.html - context caches are discussed in section 5.2.3 of http://www.kampjes.demon.co.uk/articles/exuperyDesign.pdf [3] <i> add reference to squeak mailing list or Ian's something to discuss the difference between static and dynamic closures </i> [4] Squeak Class comments of <code>InstructionStream</code>, <code>ContextPart</code>, <code> MethodContext</code>. and <code>BlockContext</code>. [5]: Context Management in VisualWorks 5i: http://wiki.cs.uiuc.edu/VisualWorks/DOWNLOAD/papers/oopsla99-contexts.pdf -- Matthew Fulmer -- http://mtfulmer.wordpress.com/ Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Hi, I've answered your questions below. If I've missed anything please ask here. Overall it looks good to me. Matthew Fulmer writes: > Cross-posted to: > [hidden email] > [hidden email] > [hidden email] > [hidden email] > comp.lang.smalltalk > > Hello lists. This is a draft of an article I am writing for an upcoming > series at the Weekly Squeak. It is about optimizing context creation, I > would like comments on its technical correctness, as well as suggestions > for related content I should include. I would especially like to know if > this article covers the way this is handled in systems I did not > specifically research, such as Strongtalk, and Exupery, and > non-smalltalk systems like Common Lisp, Erlang, Haskell, and OCaml, > since this topic is not really specific to object-oriented systems. > > This will not be the first article of the series; however, the first two > paragraphs are the preliminary series introduction. I put them in so > that you can get an idea of what I am writing. Sections enclosed in <i> > ... </i> tags are parts I am especially unsure of; I would appreciate > comments (I used i tags, because my editor, vim, highlights the enclosed > text when in HTML mode). > > Look forward to this appearing in the Weekly Squeak: > > http://news.squeak.org/ > > Thank you for your help, and enjoy the article! > > > === Speeding Up Smalltalk Series, part 42: Context Optimizations === > > Ever since object-oriented systems became popular, they have had the > image of being very easy to program, but very slow in terms of execution > speed. This image and reality has plagued every popular object-oriented > system, starting with Smalltalk in the 80's and continuing on to Java > and Ruby today. However, this does not have to be the case. Early-bound > languages have gotten much faster thanks to lots of research on > compilers and computer architecture. Similarly, much research has been > done on making late-bound object-oriented systems fast, both with > software and with hardware. This research has not seen as much exposure > as processor research, and so is probably less familiar to you. This > series, "Speeding Up Smalltalk" attempts to summarize a lot of that > research and present it for your enjoyment. > > Computation in Smalltalk and Self is done by sending messages to > objects. This seemingly simple operation of sending a message is done so > often and is so flexible that it is <b> always </b> the execution > bottleneck in any implementation of Smalltalk or similar object-oriented > system. Why is that? > > ---- Description of Context Objects ---- > > Every time a message is sent, four things happen: > - A search is done through the message receiver for a method that has > been sent. This was the topic of the previous article, and will not be > discussed here. > - A context object is allocated on the heap. > - The context object is initialized and filled with: > - local objects, including: > - pointers to the message arguments > - pointers for each local variable > - pointers for temporary storage needed for intermediate results > - A way to reference variables defined in other scopes. This is a > property blocks have, but methods don't > - In Smalltalk, by a pointer to the "home context" > - In Self, by delegate slots to the enclosing context(s) > - A return address at which to resume execution > - Squeak does not need this, since every context has it's own PC, so > when returning, the PC of the returned-to context is already in > the right place. This is equivalent to a return address, since a > child's return address is the parents PC. > - A return context that will become current upon return > - The newly-created context is made the "current" context for this > thread/process > > In Squeak, MethodContexts and BlockContexts are subclasses of > ContextPart. ContextPart has three instance variables: > - sender: The context that sent the message being executed. This context > will become current when the current context returns > - pc: A pointer to the currently executing instruction > - stackp: The pointer to the execution stack. The smalltalk virtual > machine is a stack processor. local variables are stored in the > indexed fields of the context. stackp keeps track of which OOP is at > the top of the stack. For MethodContexts: > > MethodContext extends it with 3 instance variables, and a variable-sized > area for holding the temporaries: > - method: the CompiledMethod holding the code being executed > - receiverMap: <i> Something to do with the names of the variables. Not > really sure </i> > - receiver: a pointer to "self", the receiver of the message. receiverMap isn't used in stock Squeak. It's the unused slot. Exupery originally used it to hold the compiled program counter. This was neat as contexts could be either interpreted or compiled. Exupery no longer does this as the slot is needed for BlockContexts. Exupery still uses the variable as a flag to figure out whether to run interpreted or compiled code. Exupery will stop using the variable soon to be compatable with the closure compiler. The closure compiler uses the slot to hold environments which contain variables that are shared by multiple contexts. > ---- Squeak optimization: Static Contexts ---- > > In Squeak, BlockContexts are partial closures (sometimes called static > blocks/closures [3]). Rather than hold their own code and environment, > BlockContexts piggyback off of another context/code pair. Home is the > environment that the block piggybacks off of. This usually works, but > there are some cases where this scheme will not work: > - <i> like what? Why not? I don't understand why this doesn't always > work. Something to do with re-entrancy or circular references. Not > sure why either is a problem. </i> A block variable only exists once for each method not for each block. So aCollection do: [:each| self registerCallBackBlock: [each doSomething]] Would register a call call back block for each element of aCollection except they would all use the same variable for each so they would all do the call back action for the last block. The problem with recursion is the same except there's code to stop a block from being entered twice so this at least produces an error instead of doing the wrong thing silently. > For BlockContexts: > - nargs: the number of arguments that the block must be called with. > - startpc: The index into <code>self home method</code> where the code > for the block starts. > - home: the home environment > > Squeak's NewCompiler has a different representation of blocks. These are > full closures, storing their own code and environment. They are > sometimes called dynamic closures to distinguish them from the above > "static closures" [3]. They contain: > - method: The code of the block (CompiledMethod2) > - environment: Captured temps (ClosureEnvironment) > > ---- Squeak optimization: Context Recycling ---- > > To avoid having to create a new MethodContext on every message send, > Squeak keeps a pool of pre-allocated MethodContexts, and simply fills > one up when a new one is needed. To support this optimization, > MethodContexts are only created with two sizes, rather than any size > that they could support. The small size has room for <i> a few </i> > temporaries, and the large size has room for <i> a bunch </i> [4] > > ---- VisualWorks optimization: Context Caches ---- > > The most natural way to store contexts is as a linked list of objects, > but that does not map well to the hardware of many processors. Many > processors (including all the Intel varieties) have <i> several </i> > built-in instructions to mechanize the creation of stack frames, which > are similar to context objects, but have several important differences. > VisualWorks uses these native stack frames for most operations, meaning > that context creation is identical to a C function send in many cases. > Thus, the hardware stack acts as a stand-in for the context objects; > this optimization is called a context cache. > > However, stack frames cannot handle context responsibilities beyond a > single call and return; when more is needed, a context object must be > allocated on the heap. The context object can either be a context in > it's own right (a stable context), or act as a proxy to the hardware > stack frame (hybrid context). There are several complications to this > dual scheme; curious readers should refer to [5]. > > VW and Squeak does not use a stack of context frames; instead it uses a > linked list of real context objects. This is easier, but perhaps slower. > To save on allocation costs, contexts are cached and recycled for > various method invocations. > > <i> Not sure where to incorporate this </i> > > There are three types of blocks: > - open/clean blocks (pure function with no closure) > - copying blocks > - full blocks > > ---- Or you can cheat... ---- > > The most popular object-oriented languages (C++, Java) chose to dump the > flexibility of blocks in exchange for runtime speed. These languages use > a native stack frame without providing the escape mechanism of a context > cache, so that blocks and other context behavior that does not follow > stack principles is not supported. This is a tradeoff that eliminates a > good deal of flexibility, and so will not be explored further. > > ---- Final words ---- > > Inlining solves many of the same problems as a Context cache does. If > the method body is inlined into the caller, no new context need be > created. However, inlining solves many more problems than just context > elimination; I will dedicate an entire article to it. > > ---- References ---- > > - context recycling may be discussed in tims nublue book: > http://www.rowledge.org/tim/squeak/OE-Tour.html > > - context caches are discussed in section 5.2.3 of > http://www.kampjes.demon.co.uk/articles/exuperyDesign.pdf > > [3] <i> add reference to squeak mailing list or Ian's something to > discuss the difference between static and dynamic closures </i> > > [4] Squeak Class comments of <code>InstructionStream</code>, > <code>ContextPart</code>, <code> MethodContext</code>. and > <code>BlockContext</code>. > > [5]: Context Management in VisualWorks 5i: > http://wiki.cs.uiuc.edu/VisualWorks/DOWNLOAD/papers/oopsla99-contexts.pdf > > -- > Matthew Fulmer -- http://mtfulmer.wordpress.com/ > Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808 > _______________________________________________ > Exupery mailing list > [hidden email] > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Free forum by Nabble | Edit this page |