After adding what new versions I could to the inbox repository, we have:
Packages not in inbox: Etoys Monticello MorphicExtras PlusTools Traits Packages written to inbox: Collections CollectionsTests Compiler Graphics Kernel KernelTests Morphic Network Protocols SmaCC ST80 System Tools Packages not updated because of potential conflicts: Multilingual frank ----- Original Message ----- From: "stéphane ducasse" <[hidden email]> To: "The general-purpose Squeak developers list" <[hidden email]> Sent: Tuesday, February 14, 2006 5:19 PM Subject: Re: Posting Fixes (was Re: Use of == for arithmetic equality) Hi frank publish in the inbox and we will take them from there. Stef On 14 févr. 06, at 16:02, Frank Shearar wrote: > "Cees De Groot" <[hidden email]> wrote: > >> On 2/14/06, Bert Freudenberg <[hidden email]> wrote: >>> Also, I'd prefer directly publishing to a MC repository rather than >>> uploading to Mantis. >>> >> Absolutely. I'd publish that sort of stuff in my own repository in >> such a case and add pointers to Mantis. > > Er, so should I save the in-image package MCZs to the inbox > repository? Or > will whoever manages the inbox grab the MCZs from the Mantis bug > report? > > frank > > > |
In reply to this post by stéphane ducasse-2
On 2/14/06, stéphane ducasse <[hidden email]> wrote:
> publish in the inbox and we will take them from there. Fine, Stef. I take it then you'll do the merge w.r.t. the networking code and file code, as far as afflicted. I'll ask the I/O team to suspend any work until that's done. |
In reply to this post by Andreas.Raab
On 14 févr. 06, at 16:41, Andreas Raab wrote: > stéphane ducasse wrote: >> Sure cees >> this is working well for simple package oriented fixes. >> But now it would be good that we get the changes! Look at the Network >> team for example. Should frank stack that in the team and we get >> something >> in the future (instead of now and getting done). > > Given that this code does neither have any cross-cutting > requirements and doesn't even fix anything that's broken I don't > see why you have the urge to push this in right now. As a matter of > fact, I'm somewhat concerned about these changes and would like > them to be reviewed - there are various places where the pattern > "foo == 0" is absolutely appropriate and where #= should *not* be > used and I wonder whether these places have been taken into account > properly. Indeed you are right. So have a look and let us know. I still think that this is important that we find a process to - give fast feedback on changes - find a way for cross cutting changes. > >> And each team can do a merge after. Else we can have endless >> discussions. >> See the Fix of TextAnchor of lukas. > > Where was this endless discussion? Endless or no discussion is not the same? I sent a post then I got no reaction (not even a complain :)), so we included it since this was blocking lukas for enh for icons support in browser. and we did not want to have code rot when this is simple. > > Cheers, > - Andreas > |
In reply to this post by Cees De Groot
>
> Fine, Stef. I take it then you'll do the merge w.r.t. the networking > code and file code, as far as afflicted. I'll ask the I/O team to > suspend any work until that's done. I was too fast to react and andreas is right so we need to get feedback on that changes first. |
Andreas had said that foo == 0 was sometimes the right thing. I have
difficulty imagining many such cases (and those that I can imagine, I think of as kludges), so I'm curious what he had in mind. ../Dave |
Dave Mason wrote:
> Andreas had said that foo == 0 was sometimes the right thing. I have > difficulty imagining many such cases (and those that I can imagine, I > think of as kludges), so I'm curious what he had in mind. Here is an example: SystemNavigation>>allObjectsDo: aBlock "Evaluate the argument, aBlock, for each object in the system excluding SmallIntegers." | object | object _ self someObject. [0 == object] whileFalse: [ aBlock value: object. object _ object nextObject] The reason this is correct is that for proxies you want the operation to be side-effect free and #== is side-effect free and #= may not. In addition, I sometimes use #== in critical code to bullet-proof against arbitrarily broken implementations of #= (but that's another rant for another time). Cheers, - Andreas |
In reply to this post by stéphane ducasse-2
stéphane ducasse <[hidden email]> wrote...
>this is really nice to see coming back to real life :). >Can you tell us a bit more about this new VM? >How does it relate to pepsi? This one doesn't relate to pepsi (for folks not aware of it, "pepsi" refers to one of Ian Piumarta's designs for a small dynamic kernel), but I hope that it has prepared me to do some related experiments. As you know, I am at Sun now where "curly brace" languages are still the rule, but I want to keep experimenting with dynamic systems, thin clients, and the like. I got inspired by a project that Helge Horch did, reviving an old St-78 image on top of Java (not the one he showed at ESUG). The thing actually runs very nicely (much faster than the original Notetaker), and fits in a .jar file that is under 200k including both image and interpreter. I think Helge will release this in the not-too-distant future, but he likes things to be really right, and he's currently over-busy, so it may be a few months yet. It is a gem, both for this compactness and performance, and also because, language-wise, it is a living Smalltalk-76. Helge preserved all the 16-bit oops, object table, and reference-counting aspects of St-78 VM, but I was inspired by the performance and simplicity of riding on top of Java. It blew my mind that with this modest attachment, you could have Smalltalk live on a web page. So, in order to learn Java, and the Java development environments, I am writing a Squeak in Java. The image I am using is the Mini2.1 -- the one with browser, debugger and decompiler with temp names, all in 600k. The interpreter runs now (15,000 bytecodes executed), and I am currently working on BitBlt (the story of my life ;-). The interpreter is not just a transcription of the Squeak reference interpreter. Instead (like Helge's), it uses Java objects for the objects, and thus can let Java take care of all the storage management. I've figured out a way to preserve enumeration and mutation, so it ought to be a pretty compatible implementation when it's done. I don't know where this will go. I think it would be fun to take that base and strip it down to just the kernel and then hook it up to some modern web-based graphics, network and database APIs. For now my goal is to get the original artifact going. Then, hopefully we could all have fun taking it in other directions. - Dan |
Dan Ingalls wrote:
> I don't know where this will go. I think it would be fun to take It sounds like very, very interesting work! And I think everybody now is just dying to see Helge's stuff! :-) > that base and strip it down to just the kernel and then hook it up to > some modern web-based graphics, network and database APIs. For now > my goal is to get the original artifact going. Then, hopefully we > could all have fun taking it in other directions. Hmm, so we can have sqax*? ;-) Michael * instead of ajax |
In reply to this post by Frank Shearar
Hi Diego! I agree with the spirit of what you're
saying; that > Only a small set of objects can > answer for equality in > a senseful way. and that many designs override = in a non-sensical way that can end up being costly. And not just to the software, it has the psychological effect of causing people to think of the "data" instead of their "objects." However I think its necessary to have "equality" defined by default for all objects based on identity. Dan put it succintly: > It is true that a==b implies a=b There are many cases where logical equality for any object is useful. Any object, for example, should be able to be put into a Collection and collections need to ask logical equality on their elements to work properly. Regards, Chris |
For anyone tempted to think that there's a "right answer(TM)" here(1)
(2), I recommend Kent Pitman's classic: http://www.nhplace.com/kent/PS/EQUAL.html (1) I'm not accusing anyone of so thinking. (2) I mean that there's no universally right definition of equality. Of course, I do NOT mean that it isn't possible to write a correct program for a given domain that uses some kind of equality test. |
In reply to this post by Dan Ingalls
Hi, Stef -
[I'm copying this to Sq-dev because I figured a number of other folks might be interested in this] >Should I understand that the underlying structure are not C but Java. So this is a bit like in Hobbes.? >Does it means that you do not have to deal with GC? Yes, yes, and yes. >You get all the dynamism of Smalltalk because you do not use directly the Java byte-code. >When you redefine a method you just install the method bytecode in the class ST objects >and the interpreter will interpret the Smalltalk byte-code Other than the use of Java objects for the objects, it's really an emulation -- ie, it does not use Java classes for ST classes. To summarize, I have only a few classes... SqueakObject -- all Sq objects are an instance of this class. sqClass - points to another SqueakObject, the class format - like Squeak hash - like Squeak pointers - all pointer fields, named and indexable, if there are any bits - all bit-like fields, if there are any Clearly this can later be optimized with variants for a number of common Squeak formats (Hobbes does this as well). I'm trying to keep this very simple for now, though. Integer - I use Java's Integers (boxed raw ints) for SmallIntegers. This allows the pointer space to be uniformly Java Objects. This is what revealed the ==/= problem. The easy fix is integer checks in ==, but I planned all along to do interning of small values to improve performance of all SmallInteger traffic. SqueakImage - mostly devoted to reading the image, but it also includes a weak object table used for enumeration. SqueakVM - the central interpreter SqueakPrimitivehandler - all the primitives >I guess that you do not generate Java native byte code? >Gilad was mentioning that they will introduce a new bytecode for dynamic lookup in Java. Correct; I do not generate Java bytecode (remember this is *simple*). But of course that is a possible approach for adding a JIT. Of more likely value in the short term is to modify the CTranslator to produce Java, and thus effectively offer AOT (ahead-of-time) compilation of primitives and any other performance-critical code. - Dan |
In reply to this post by Dan Ingalls
Hi -
I've got my little Squeak in Java running (hope to send out a link soon), and I've been pondering how to make it run faster. In the process, I've thought of two techniques, one of which is new (to me) and the other occurred to me years ago, but I never tried it out. Since neither would really be all that hard to do in Squeak, I thought I'd mention them here for those folks who delight in such things, and with the further hope that someone might actually try them out. Lazy Activation This was the next thing I was going to do for Apple Smalltalk back when I got drafted to the hotel business back in 1987. The essence of the idea is that the purpose of activating a context is to save the execution state in case you have to do a send and, conversely, you don't really need an activation if you never need to do a real send. I had a lot of fun instrumenting the VM to figure out just how many activations could be avoided in this way, and my recollection is that it was roughly 50%. I believe the statistics were better dynamically than statically, because there are a lot of methods that, in general need to be activated, but they may begin with a test such as position > limit ifTrue: [^ false] and for every time that this test succeeds, you can get away without ever needing an activation. But, you say, you still need a pointer to the method bytes and a stack frame, and this is true, but you don't need to allocate and initialize a full context, nor to transfer the arguments. The idea is that, when you hit the send, you do the lookup, find the method, and then jump to a *separate copy* of the interpreter that has a different set of bytecode service routines. For instance, 'loadTemp' will, depending on the argument count, load from the stack of the calling method (which is still the "active" context). 'Push', since there is no allocated stack, pushes into a static array and, eg, 'plus' does the same old add, but it gets its args from the static array, and puts its result back there. And if anything fancy, such as a real send, does occur, then a special routine is called to do a real activation, copy this static state into it appropriately, and retry the bytecode in the normal interpreter. It's probably worth confirming the results that I remember, but I wouldn't be surprised if one could almost double the speed of Squeak in this manner. Cloned Activation This one I just thought of, but I can't believe someone hasn't already tried it, either in squeak or some similar system. The idea here is to provide a field in the method cache for an extra copy of a properly initialized context for the method (ie, correct frame size, method installed, pc and stack pointer set, etc). Then, when a send occurs, all you have to do is an array copy into blank storage, followed by a short copy of receiver and args from the calling stack. There's a space penalty for carrying all the extra context templates, of course, but I think it's not unreasonable. Also, one could avoid it for all one-time code by only allocating the extra clone on the second call (ie, first call gets it into the method cache; second call allocates clone for the cache). I have little sense of how much this might help these days -- I haven't looked in detail at the activation code for quite a while. Obviously the worse it si right now, the more this technique might help. Mainly I just like to think about this stuff, and it occurred to me that, if someone were looking for a fun experiment or two, it might turn out to have some practical value. I haven't looked at Exupery to know whether these things are already being done, or whether they might fit well with the other techniques there, but I'm sure Bryce could say right off the bat. - Dan |
Dan,
> Lazy Activation I included a slightly related idea in a 4 bit Smalltalk: http://www.merlintec.com:8080/Hardware/dietST This had an "enter" bytecode for explicitly creating a new context and a "grabArg" bytecode from moving stuff from the sender's bytecode to the newly created one. The idea was that the compiler would generate the bytecodes for this as late in a method as possible (in the best cases - never). This was inspired by the Smalltalks that defer the creation of temporary variables until their first assignment. This static solution is not as powerful as your dynamic one, but it does have a few things in common. This project only got as far as a SmaCC compiler for these bytecodes in Squeak, so I never got any dynamic statistics for this; > Cloned Activation This is what I did in NeoLogo: http://www.merlintec.com/pegasus2000/e_neologo.html NeoLogo was just a paper design but this feature was also present in the "SuperLogo" which I implemented in 1983 in TI99/4A Extended BASIC. It worked great and actually makes the run time simpler at the cost of slightly complicating the parser. Self actually explains method activation in this way to the users (though the implementation is radically different) because it is easier to understand than the traditional schemes. -- Jecel |
In reply to this post by Dan Ingalls
>Jecel Assumpcao Jr <[hidden email]> wrote...
Great to hear from you , Jecel. > > Lazy Activation > >I included a slightly related idea in a 4 bit Smalltalk: > >http://www.merlintec.com:8080/Hardware/dietST > >This had an "enter" bytecode for explicitly creating a new context and a >"grabArg" bytecode from moving stuff from the sender's bytecode to the >newly created one. The idea was that the compiler would generate the >bytecodes for this as late in a method as possible (in the best cases - >never). This was inspired by the Smalltalks that defer the creation of >temporary variables until their first assignment. This static solution >is not as powerful as your dynamic one, but it does have a few things in >common. Completely. Yes, it's in interesting case in which the dynamic solution, by virtue of more information, can do somewhat better. >This project only got as far as a SmaCC compiler for these bytecodes in >Squeak, so I never got any dynamic statistics for this; > >> Cloned Activation > >This is what I did in NeoLogo: > >http://www.merlintec.com/pegasus2000/e_neologo.html > >NeoLogo was just a paper design but this feature was also present in the >"SuperLogo" which I implemented in 1983 in TI99/4A Extended BASIC. It >worked great and actually makes the run time simpler at the cost of >slightly complicating the parser. Cool. >Self actually explains method >activation in this way to the users (though the implementation is >radically different) because it is easier to understand than the >traditional schemes. Yes. I've been talking with Dave about Self recently , and if may be that this was tickling my brain cells at the time. - Dan |
In reply to this post by Dan Ingalls
Hi Dan --
Wrt your first optimization ... The SCHEME guys used similar arguments in one or more of the SCHEME papers to show that there are many cases in which no stack has to be allocated or popped (so a simple goto in the code will do the job). Someone on the Squeak list probably has a reference to the paper or papers I'm talking about. Sounds like a good idea, and should work pretty well. The second idea sounds like it should work very well also. Cheers, Alan At 06:11 AM 3/22/2006, Dan Ingalls wrote: >Hi - > >I've got my little Squeak in Java running (hope to send out a link soon), >and I've been pondering how to make it run faster. In the process, I've >thought of two techniques, one of which is new (to me) and the other >occurred to me years ago, but I never tried it out. > >Since neither would really be all that hard to do in Squeak, I thought I'd >mention them here for those folks who delight in such things, and with the >further hope that someone might actually try them out. > > >Lazy Activation >This was the next thing I was going to do for Apple Smalltalk back when I >got drafted to the hotel business back in 1987. The essence of the idea >is that the purpose of activating a context is to save the execution state >in case you have to do a send and, conversely, you don't really need an >activation if you never need to do a real send. > >I had a lot of fun instrumenting the VM to figure out just how many >activations could be avoided in this way, and my recollection is that it >was roughly 50%. I believe the statistics were better dynamically than >statically, because there are a lot of methods that, in general need to be >activated, but they may begin with a test such as > position > limit ifTrue: [^ false] >and for every time that this test succeeds, you can get away without ever >needing an activation. > >But, you say, you still need a pointer to the method bytes and a stack >frame, and this is true, but you don't need to allocate and initialize a >full context, nor to transfer the arguments. The idea is that, when you >hit the send, you do the lookup, find the method, and then jump to a >*separate copy* of the interpreter that has a different set of bytecode >service routines. For instance, 'loadTemp' will, depending on the >argument count, load from the stack of the calling method (which is still >the "active" context). 'Push', since there is no allocated stack, pushes >into a static array and, eg, 'plus' does the same old add, but it gets its >args from the static array, and puts its result back there. And if >anything fancy, such as a real send, does occur, then a special routine is >called to do a real activation, copy this static state into it >appropriately, and retry the bytecode in the normal interpreter. > >It's probably worth confirming the results that I remember, but I wouldn't >be surprised if one could almost double the speed of Squeak in this manner. > > >Cloned Activation >This one I just thought of, but I can't believe someone hasn't already >tried it, either in squeak or some similar system. The idea here is to >provide a field in the method cache for an extra copy of a properly >initialized context for the method (ie, correct frame size, method >installed, pc and stack pointer set, etc). Then, when a send occurs, all >you have to do is an array copy into blank storage, followed by a short >copy of receiver and args from the calling stack. > >There's a space penalty for carrying all the extra context templates, of >course, but I think it's not unreasonable. Also, one could avoid it for >all one-time code by only allocating the extra clone on the second call >(ie, first call gets it into the method cache; second call allocates clone >for the cache). > >I have little sense of how much this might help these days -- I haven't >looked in detail at the activation code for quite a while. Obviously the >worse it si right now, the more this technique might help. > > >Mainly I just like to think about this stuff, and it occurred to me that, >if someone were looking for a fun experiment or two, it might turn out to >have some practical value. I haven't looked at Exupery to know whether >these things are already being done, or whether they might fit well with >the other techniques there, but I'm sure Bryce could say right off the bat. > > - Dan |
In reply to this post by Dan Ingalls
Dan Ingalls wrote on Wed, 22 Mar 2006 07:32:19 -0800
> > > Lazy Activation > > Completely. Yes, it's in interesting case in which the dynamic solution, > by virtue of more information, can do somewhat better. When you have a method with something like ... ... ifTrue: [ .... "need to allocate context" coll message .... ]. ... ... myArg ... In the dynamic system the exact same bytecode for "myArg" can do different things depending on the execution of the conditional expression. The static compiler can only select one of two alternative bytecode sequences for "myArg". The solutions are to either move the context allocation to before the #ifTrue: so it will always happen even when not needed or to do a Craig Chambers style "code splitting" and generate two separate versions for the rest of the method after the #ifTrue:. The first option will perform worse than the dynamic system while the second option, when applied to enough methods, will make the memory footprint of having two (or four, or eight!) interpreters tiny in comparison. > >> Cloned Activation I forgot to mention a future "Smalltalk in hardware" project I have (or will have, actually) that is an extreme example of this: http://www.merlintec.com:8080/hardware/19 Here code is represented by trees of message templates rather than bytecodes. When you are going to execute an expression, you first clone the whole tree. Each individual node waits until all its elements are constants (rather than subtrees) and then it turns itself into a message and flows around the machine. When it meets the receiver object and finds it is associated with some code, then that new tree is cloned with the root pointing to the message (now context). The same object has three different roles through its lifetime, so this is a very eager allocation of contexts. > Yes. I've been talking with Dave about Self recently , and if may be that this was tickling my brain cells at the time. I envy you both and hope great stuff comes from these discussions! -- Jecel |
In reply to this post by Alan Kay
Alan - Are you referring to this paper: LAMBDA: The Ultimate GOTO http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-443.pdf Or one the others: http://library.readscheme.org/page1.html Sidenote: In googling for this paper, I found that Richard Gabriel was going to publish a collection of Guy Steele's papers in book form--but this doesn't appear to have happened. See http:// www.dreamsongspress.com. I, for one, would buy this in an instant. david On Mar 22, 2006, at 9:11 AM, Alan Kay wrote: > Hi Dan -- > > Wrt your first optimization ... The SCHEME guys used similar > arguments in one or more of the SCHEME papers to show that there > are many cases in which no stack has to be allocated or popped (so > a simple goto in the code will do the job). Someone on the Squeak > list probably has a reference to the paper or papers I'm talking > about. Sounds like a good idea, and should work pretty well. The > second idea sounds like it should work very well also. > > Cheers, > > Alan > > At 06:11 AM 3/22/2006, Dan Ingalls wrote: >> Hi - >> >> I've got my little Squeak in Java running (hope to send out a link >> soon), and I've been pondering how to make it run faster. In the >> process, I've thought of two techniques, one of which is new (to >> me) and the other occurred to me years ago, but I never tried it out. >> >> Since neither would really be all that hard to do in Squeak, I >> thought I'd mention them here for those folks who delight in such >> things, and with the further hope that someone might actually try >> them out. >> >> >> Lazy Activation >> This was the next thing I was going to do for Apple Smalltalk back >> when I got drafted to the hotel business back in 1987. The >> essence of the idea is that the purpose of activating a context is >> to save the execution state in case you have to do a send and, >> conversely, you don't really need an activation if you never need >> to do a real send. >> >> I had a lot of fun instrumenting the VM to figure out just how >> many activations could be avoided in this way, and my recollection >> is that it was roughly 50%. I believe the statistics were better >> dynamically than statically, because there are a lot of methods >> that, in general need to be activated, but they may begin with a >> test such as >> position > limit ifTrue: [^ false] >> and for every time that this test succeeds, you can get away >> without ever needing an activation. >> >> But, you say, you still need a pointer to the method bytes and a >> stack frame, and this is true, but you don't need to allocate and >> initialize a full context, nor to transfer the arguments. The >> idea is that, when you hit the send, you do the lookup, find the >> method, and then jump to a *separate copy* of the interpreter that >> has a different set of bytecode service routines. For instance, >> 'loadTemp' will, depending on the argument count, load from the >> stack of the calling method (which is still the "active" >> context). 'Push', since there is no allocated stack, pushes into >> a static array and, eg, 'plus' does the same old add, but it gets >> its args from the static array, and puts its result back there. >> And if anything fancy, such as a real send, does occur, then a >> special routine is called to do a real activation, copy this >> static state into it appropriately, and retry the bytecode in the >> normal interpreter. >> >> It's probably worth confirming the results that I remember, but I >> wouldn't be surprised if one could almost double the speed of >> Squeak in this manner. >> >> >> Cloned Activation >> This one I just thought of, but I can't believe someone hasn't >> already tried it, either in squeak or some similar system. The >> idea here is to provide a field in the method cache for an extra >> copy of a properly initialized context for the method (ie, correct >> frame size, method installed, pc and stack pointer set, etc). >> Then, when a send occurs, all you have to do is an array copy into >> blank storage, followed by a short copy of receiver and args from >> the calling stack. >> >> There's a space penalty for carrying all the extra context >> templates, of course, but I think it's not unreasonable. Also, >> one could avoid it for all one-time code by only allocating the >> extra clone on the second call (ie, first call gets it into the >> method cache; second call allocates clone for the cache). >> >> I have little sense of how much this might help these days -- I >> haven't looked in detail at the activation code for quite a while. >> Obviously the worse it si right now, the more this technique might >> help. >> >> >> Mainly I just like to think about this stuff, and it occurred to >> me that, if someone were looking for a fun experiment or two, it >> might turn out to have some practical value. I haven't looked at >> Exupery to know whether these things are already being done, or >> whether they might fit well with the other techniques there, but >> I'm sure Bryce could say right off the bat. >> >> - Dan > > > |
In reply to this post by Dan Ingalls
Dan Ingalls writes:
> Mainly I just like to think about this stuff, and it occurred to me > that, if someone were looking for a fun experiment or two, it might > turn out to have some practical value. I haven't looked at Exupery > to know whether these things are already being done, or whether they > might fit well with the other techniques there, but I'm sure Bryce > could say right off the bat. Exupery currently creates contexts using the same code as the interpreter does. It calls a C/Slang helper function to set up the new context. Exupery is about 2.5 times faster than the interpreter for sends which indicates that most of the time is spent figuring out what method is should be executed rather than in creating the context. The speed improvement comes from using polymorphic inline caches which make sends to compiled code from compiled code dispatch very quickly. I'd guess that by tuning the current system and producing custom machine code for the common case where the new context is recycled it would double send performance to about 5 times faster than the interpreter. My plan current to introduce dynamic method inlining based heavily on Urs Holzle's Self work to Exupery after finishing a 1.0. That will completely remove the context creation costs from the most frequently used sends. Dynamic method inlining has the advantage that it can eliminate the sends from #do: loops as well as from leaf methods. However, for Exupery, finishing 1.0 is much more important than adding dynamic method inlining. A mere 2.5 times gain in send performance is enough to provide a practical speed improvement. For now my time is better spent first debugging compiled blocks then fixing minor issues that limit Exupery's current usefulness. Bryce |
In reply to this post by Dan Ingalls
Hi
I wanted to know how this was relating to the way VW treats blocks: clean block [:each | each zork], copy blocks and full blocks. Does anybody able to compare? Stef On 22 mars 06, at 15:11, Dan Ingalls wrote: > Hi - > > I've got my little Squeak in Java running (hope to send out a link > soon), and I've been pondering how to make it run faster. In the > process, I've thought of two techniques, one of which is new (to > me) and the other occurred to me years ago, but I never tried it out. > > Since neither would really be all that hard to do in Squeak, I > thought I'd mention them here for those folks who delight in such > things, and with the further hope that someone might actually try > them out. > > > Lazy Activation > This was the next thing I was going to do for Apple Smalltalk back > when I got drafted to the hotel business back in 1987. The essence > of the idea is that the purpose of activating a context is to save > the execution state in case you have to do a send and, conversely, > you don't really need an activation if you never need to do a real > send. > > I had a lot of fun instrumenting the VM to figure out just how many > activations could be avoided in this way, and my recollection is > that it was roughly 50%. I believe the statistics were better > dynamically than statically, because there are a lot of methods > that, in general need to be activated, but they may begin with a > test such as > position > limit ifTrue: [^ false] > and for every time that this test succeeds, you can get away > without ever needing an activation. > > But, you say, you still need a pointer to the method bytes and a > stack frame, and this is true, but you don't need to allocate and > initialize a full context, nor to transfer the arguments. The idea > is that, when you hit the send, you do the lookup, find the method, > and then jump to a *separate copy* of the interpreter that has a > different set of bytecode service routines. For instance, > 'loadTemp' will, depending on the argument count, load from the > stack of the calling method (which is still the "active" context). > 'Push', since there is no allocated stack, pushes into a static > array and, eg, 'plus' does the same old add, but it gets its args > from the static array, and puts its result back there. And if > anything fancy, such as a real send, does occur, then a special > routine is called to do a real activation, copy this static state > into it appropriately, and retry the bytecode in the normal > interpreter. > > It's probably worth confirming the results that I remember, but I > wouldn't be surprised if one could almost double the speed of > Squeak in this manner. > > > Cloned Activation > This one I just thought of, but I can't believe someone hasn't > already tried it, either in squeak or some similar system. The > idea here is to provide a field in the method cache for an extra > copy of a properly initialized context for the method (ie, correct > frame size, method installed, pc and stack pointer set, etc). > Then, when a send occurs, all you have to do is an array copy into > blank storage, followed by a short copy of receiver and args from > the calling stack. > > There's a space penalty for carrying all the extra context > templates, of course, but I think it's not unreasonable. Also, one > could avoid it for all one-time code by only allocating the extra > clone on the second call (ie, first call gets it into the method > cache; second call allocates clone for the cache). > > I have little sense of how much this might help these days -- I > haven't looked in detail at the activation code for quite a while. > Obviously the worse it si right now, the more this technique might > help. > > > Mainly I just like to think about this stuff, and it occurred to me > that, if someone were looking for a fun experiment or two, it might > turn out to have some practical value. I haven't looked at Exupery > to know whether these things are already being done, or whether > they might fit well with the other techniques there, but I'm sure > Bryce could say right off the bat. > > - Dan > |
In reply to this post by Bryce Kampjes
Bryce Kampjes <[hidden email]> wrote...
>My plan current to introduce dynamic method inlining based heavily on >Urs Holzle's Self work to Exupery after finishing a 1.0. That will >completely remove the context creation costs from the most frequently >used sends. Dynamic method inlining has the advantage that it can >eliminate the sends from #do: loops as well as from leaf methods. I agree that inlining is the ultimate way to go here. You can see lazy activation as a sort of lazy approach to inlining, but inlining is better because it eliminates the lookup and context switch times completely (when possible (which is typically very often)). >However, for Exupery, finishing 1.0 is much more important than adding >dynamic method inlining. A mere 2.5 times gain in send performance is >enough to provide a practical speed improvement. For now my time is >better spent first debugging compiled blocks then fixing minor issues >that limit Exupery's current usefulness. I certainly agree that a bird in the hand is worth two in the bush, and I'm especially glad that you feel that way. Let's hear it for completion of 1.0! - Dan |
Free forum by Nabble | Edit this page |