As I begin to grasp the scope of the optimizations in the interpreter
and C-Translator, I am filled with a sense of awe. These tweaks were obviously done with a deep knowledge of Smalltalk, C, the behavior of C compilers, and assembler. At the same time, I'm filled with horror as I realize the people who wrote these must of have sold their souls, left kidneys, and first born offspring in order to pull it off. While the VM, as it exists today, is adequate for conventional Squeak as it is presently used. Unfortunately, however, the scope and nature of its optimizations make it difficult to modify and extend. The non-obvious things the C-translator does make VM hacking extremely inhospitable to newcomers. For example, I would look at the class definition and would think to myself: OK, these "instance" variables will be put into a structure that can be instantiated by appropriate calls to C, and the class variables will probably end up being static variables in the same source file. -- thereby approximating the behavior of Smalltalk classes. Instead, class variables are optimized as constants, and while instance variables are put in a structure, there is no way to instantiate the structure or manage such instances should they be created. Much much worse, however, is that the C translator will behave differently depending on which class it is processing and apply special hacks where it thinks nobody will notice. -- The hapless newbie probably won't discover these hacks until he has read at least 500 messages... I'm going to try to fork my own version of the vmmaker intended to be much more flexible and robust to experimentation though, perhaps, a bit slower... (The current version of my custom VM is half as fast but still usable...) The current issue that I'm having trouble with is figuring out exactly how the CCode generator interfaces with the system wide compiler. I am especially curious as to how to implement scope-reduced variables. (variables declared within a block instead of a function...) How exactly are syntax elements mapped onto the translation classes? In any event, I hope to have an interesting variation on the VM technology someday... -- Friends don't let friends use GCC 3.4.4 GCC 3.3.6 produces code that's twice as fast on x86! http://users.rcn.com/alangrimes/ |
Alan Grimes wrote:
>As I begin to grasp the scope of the optimizations in the interpreter >and C-Translator, I am filled with a sense of awe. These tweaks were >obviously done with a deep knowledge of Smalltalk, C, the behavior of C >compilers, and assembler. At the same time, I'm filled with horror as I >realize the people who wrote these must of have sold their souls, left >kidneys, and first born offspring in order to pull it off. > >While the VM, as it exists today, is adequate for conventional Squeak as > it is presently used. Unfortunately, however, the scope and nature of >its optimizations make it difficult to modify and extend. The >non-obvious things the C-translator does make VM hacking extremely >inhospitable to newcomers. For example, I would look at the class >definition and would think to myself: OK, these "instance" variables >will be put into a structure that can be instantiated by appropriate >calls to C, and the class variables will probably end up being static >variables in the same source file. -- thereby approximating the behavior >of Smalltalk classes. > >Instead, class variables are optimized as constants, and while instance >variables are put in a structure, there is no way to instantiate the >structure or manage such instances should they be created. Much much >worse, however, is that the C translator will behave differently >depending on which class it is processing and apply special hacks where >it thinks nobody will notice. -- The hapless newbie probably won't >discover these hacks until he has read at least 500 messages... > >I'm going to try to fork my own version of the vmmaker intended to be >much more flexible and robust to experimentation though, perhaps, a bit >slower... (The current version of my custom VM is half as fast but still >usable...) > >The current issue that I'm having trouble with is figuring out exactly >how the CCode generator interfaces with the system wide compiler. I am >especially curious as to how to implement scope-reduced variables. >(variables declared within a block instead of a function...) > >How exactly are syntax elements mapped onto the translation classes? > >In any event, I hope to have an interesting variation on the VM >technology someday... > > improve? That'd help the rest of us! brad |
In reply to this post by Alan Grimes
On 1-Nov-05, at 12:41 PM, Alan Grimes wrote: > Instead, class variables are optimized as constants, and while > instance > variables are put in a structure The instance variables are put into a structure because on powerpc if they are non-structure static/nonstatic variables in the scope of the file then it takes an extra memory load to deference the data storage pointer to load/store the variable. By using a structure you avoid that extra load, this is why the structure is there. Testing on intel based and 68K machines showed there was no impact so along the way we made it the default, although I think you can choose to turn it off. a) Usage of the register struct foo * foo = &fum; ensures that on powerpc the foo pointer gets into a register if and only if two or more references are made to the structure. b) Some variables are not in the structure because they require initialization, this could be changed by having a method that actually does the initialization. c) Over the years sometimes arrays have gone into or out of the structure on powerpc based on compiler behaviour. d) Technically on register happy machines you could say to GCC let register 42 contain the foo pointer, if you of course ensure all plugins are happy with that rule. e) Inlining has a modification so that if a instance variable that is used in multiple routines and is then folded into a single routine then that variable is consolidated into a local scoped variable. The main user of this logic is in the GC logic where variables are shared between different methods making it easy to write the algorithms, but all those methods are folded into a single C procedure. This change made a significant improvement in GC performance on register happy machines. f) The interpreter case loop has logic to scope local variable usage to a particular case statement, versus scoping to the entire C procedure. By scoping to individual cases statements most compilers are much happier to do register optimizations. g) lastly gnuifying alters the case statement to use jumptables which is much more efficient. h) using C++ inline keyword and not-inlining the VM has in the past produced lousy performance. i) The inline uses some rules to decide if small routines can be inline, otherwise it follows the hint from self inline: boolean if the routine could be inline and not fail some other rule other than length. In the past there was a patch I did to say yes do the inline anyways for this procedure, not sure if that is still in vmmaker. j) Compiler optimizations can do ugly things with common code elimination etc etc, such as dragging part of the common send logic into many of the individual bytecode case logic. Testing across (many) gcc versions will show you which is the best compiler for your platform. -- ======================================================================== === John M. McIntosh <[hidden email]> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
Free forum by Nabble | Edit this page |