You can help me out. I keep getting this wacky idea which does not seem to go away. You can tell me what is wrong with it. OK. Here it is: To get rid of PICs and dynamic method lookups, each Class knows selectors of methods it defines. Each successful method definition is added not only to the method dictionary of the class, but copied down transitively to all subclasses, and their sub-subclasses, who have not defined a method with the same selector. At #DeepRehash time, _each_ method dictionary knows all of its selectors and creates a "perfect hash" function, or something close to it. So lookup is just into the specific method dictionary of the instance's class which yields the proper method (backstop method is DNU). Never a miss. No class hierarchy search for inherited methods. Memory is cheap these days, but selectors and compiled methods are mostly shared and the memory cost of method dictionaries is probably balanced out by elimination of PICS at the call sites. The UI tools need to keep some simple rules (e,g, transitively deleting methods from subclass method dictionaries), but these should be simple and obvious. The above seems simple enough, but nobody seems to be doing it. Why not? Thanks much for insights! -KenD
-KenD
|
Hi Ken, I am by no means a VM expert, but this idea can be enhanced further: Let's generate consecutive numbers for all the symbols used as message selectors then this "perfect hash" degenerates to an array with pointers to the compiled method. Let's also ignore -- for the sake of simplicity -- symbols that are garbage collected; the holes occurring can simply be filled by some sort "free-list-algorithm. Now you have something the C++ programmer knows really well: A virtual function table. I have no clue whether those VFTs will help. Just my 0.01€. Best Regards, Gerald On 2021-03-17 16:13, [hidden email] wrote: > > You can help me out. > > I keep getting this wacky idea which does not seem to go away. You can > tell me what is wrong with it. > > OK. Here it is: > > To get rid of PICs and dynamic method lookups, each Class knows > selectors of methods it defines. Each successful method definition is > added not only to the method dictionary of the class, but copied down > transitively to all subclasses, and their sub-subclasses, who have not > defined a method with the same selector. > > At #DeepRehash time, _each_ method dictionary knows all of its selectors > and creates a "perfect hash" function, or something close to it. > > So lookup is just into the specific method dictionary of the instance's > class which yields the proper method (backstop method is DNU). Never a > miss. No class hierarchy search for inherited methods. > > Memory is cheap these days, but selectors and compiled methods are > mostly shared and the memory cost of method dictionaries is probably > balanced out by elimination of PICS at the call sites. > > The UI tools need to keep some simple rules (e,g, transitively deleting > methods from subclass method dictionaries), but these should be simple > and obvious. > > The above seems simple enough, but nobody seems to be doing it. > > Why not? > > Thanks much for insights! > -KenD |
In reply to this post by KenDickey
Ken, I presented a similar design at OOPSLA 2003. It was for a unique hardware (a low cost terminal for truck drivers) and had several features I would not have normally put in a VM. The main memory was 4MB of SDRAM since that was the lowest cost option at the time while it only had 512KB of Flash. Even after expanding the compressed image from the Flash to the SDRAM there was a lot of memory left over. This was a 16 bit VM with 32K SmallIntegers and only 32K object pointers. I made SmallIntegers do double duty as Symbols. So the number 3 and the symbol #between:and: might be the exact same object. You could use it as one or the other by sending different messages as there is very little overlap. You can find the "symbol string" associated with 3 by looking at the third entry in SymbolTable. The equivalent of Class was the CodeObject. It includes all of the compiled code for the class, both local and inhericated (copied down, as you said). The processor had a bytecode style machine language so the source texts from the compressed image was translated to that when copied to the CodeObject. The code objects had two parts: an initial set of entries with one entry for every possible symbol and one huge chunk of code. Each entry had enough room for 4 instructions, which is enough to implement most methods. For those that need more space you just include a jump to somewhere in the second part of the code object. The point was to eliminate every extra clock cycle possible from a message send. Given a pointer to some object, you could get its corresponding code object by just zeroing the bottom few bits of the pointer (no memory access to get the class from the oop). So if you are trying to send message 3 to oop you just jump to: call (oop & CodeObjectMask) + 3 * EntrySizeInBytes Single clock cycle message sends for both local and inherited methods, at a rather absurd cost in memory. If it were a 32 bit VM the wasted memory would just be too silly. A nice overview of the different ways to compress the class vs selector tables can be found in the ECOOP 95 paper "Message Dispatch on Pipelined Processors" by Karel Driesen, Urs Hölzle and Jan Vitek: > https://www.researchgate.net/publication/221496253_Message_Dispatch_on_Pipelined_Processors -- Jecel |
In reply to this post by CyDefect
HI Gerald, Thanks for the feedback. I am aware of Virtual Function Tables, Sparse Array index compression, lots of strategies for Dynamic Dispatch, many I have used to implement various object systems. One can arrange the hash/ID of selectors, classes, and method tables. I am looking for something which is space compact, simple to explain, runtime efficient, supports dynamic updates, and "fails softly" (an "imperfect hash" still works in a Dictionary). Also, deals with polymorphism and super and does not require an expensive #isSubclassOf: test. My thought with "Copy Down" is that while loading a bunch of code, one can build simple method dictionaries, then apply a #deepRehash after the load. This could be done to affected method dictionaries each time a method is accepted during development as UI's (i.e. users) are slow. The #deepRehash for each method dictionary could use simple trials to generate a compact table/array and set the hash for a method Dictionary so that (<selector> hash) <logical-op> (<mtable> hash>) finds the <selector>'s method without collisions. The #deepRehash only requires local knowledge (all used selectors are known), not a global analysis to renumber classes or selectors. Perfect hash would be something competitive, speed wise, with PICs but does not require cache management or inline code changes. With lookup in only one table/dict one does not need to traverse the class hierarchy on a miss, because lookup never misses (DNU in every table which is answer on not found -- separate lookup if invoker wants nil instead). A bit of work to craft hashfun creation, but seems worthwhile. And an interesting problem to work on. :) The idea seems simple enough that I suspect someone has looked at this strategy. If this is a really bad idea, I don't have to spend any more thing thinking about it or worse, actually doing something. ;^) Thanks again, -KenD On 2021-03-18 07:25, [hidden email] wrote: > Hi Ken, > > I am by no means a VM expert, but this idea can > be enhanced further: > > Let's generate consecutive > numbers for all the symbols used as message selectors then this > "perfect hash" degenerates > to an array with pointers to the compiled > method. Let's also ignore > -- for the sake of simplicity -- > symbols that are garbage collected; > the holes occurring can simply be filled > by some sort "free-list-algorithm. > > Now you have something the C++ programmer knows > really well: A virtual function table. > > I have no clue whether those VFTs will help. > > > Just my 0.01€. > > > Best Regards, > > Gerald
-KenD
|
In reply to this post by KenDickey
Jecel, Thanks much for both your thoughtful reply and the paper reference. This latter was the gold I was looking for! Too many Dispatch papers to search through! Good on ya, -KenD ======== > A nice overview of the different ways to compress the class vs selector tables can be found in the ECOOP 95 paper "Message Dispatch on Pipelined Processors" by Karel Driesen, Urs Hölzle and Jan Vitek: > https://www.researchgate.net/publication/221496253_Message_Dispatch_on_Pipelined_Processors ========
-KenD
|
In reply to this post by KenDickey
Hi Ken, On Wed, Mar 17, 2021 at 8:13 AM <[hidden email]> wrote:
I don't think no one is doing this. I expect for example that the SqueakJS effort would explore this.
Speaking for myself when I did my threaded code version of BrouHaHa in the late '80's/early '90's I added copy down in the bytecode-to-threaded-code JIT so that self sends were statically bound. t was an interesting experiment, and placed some additional constraints on become: if become changes any object's class then all activations must be visited and checked for validity, changing the current method to match the receiver if its class no longer matches the activation's copied down method. This is easy to do with a threaded code JIT because there is a direct mapping between bytecode and threaded code, at least in my design, so that each byte of bytecode produced 8 bytes of threaded code (function and parameter, 32 bit arch). But in general it can be difficult. One wants to be able to take advantage of the static receiver, but if one does, then mapping activations between different versions of a copied down (& possibly optimized) method at any suspension point can become problematic. When I joined Parc Place in '95 I got to know the world's best Smalltalk JIT at that time, Peter Deutsch's, HPS (Schiffman's context-to-stack mapper & Jackson's and Ungar's GC). This VM doesn't do copy-down and it influenced my work on the VM (I improved method caching, co-designed a new bytecode set and closure representation, designed a much improved context-to-stack mapping scheme, improved old space free space management, and did the 64-bit object representation which introduced class indices in headers rather than class pointers. This experience from '95 through '06 heavily influenced my work on Cog and Spur. Indeed the architectural idea for Sista/Scorch was inspired by performance analysis work Peter did, written up in the internal documentation for HPS. Peter had introduced an in-line primitive bytecode so he could hand-generate static code to measure where the time is spent executing Smalltalk. With Sista/Scorch the entire policy for code management is moved up to the image, and is no longer a VM decision. Scorch is freely able to copy-down and if it chooses to mediate become and class schema migration it has far more power to manage more aggressive optimization and code duplication than the VM. So I'm looking forward to exploring copy-down in that context. Of course, now that Clément has gone to Google, this work won't get done unless I can find collaborators. And here I sound like a broken record.
_,,,^..^,,,_ best, Eliot |
On Thu, Mar 18, 2021 at 1:11 PM Eliot Miranda <[hidden email]> wrote:
Nah. SqueakJS is supposed to be compatible with existing images, I'm not interested in changing the image to suit the VM. So this copy-down would have to happen at image load time or dynamically, and it would have to be hidden from the image. I think inline caches are the safer bet, polymorphic or not. Plus, if you squint at them in the right way, PICs are equivalent to copy-down. They "copy" the right method into the cache. That's a late-bound version of copy-down, whereas actual copy-down would be early-bound and require all tools to change to maintain the invariants. Sure it can be done, but it also makes certain use cases a lot more expensive: adding a method to Object is not a big deal today, but with copy-down, you would have to copy it to every single class in the whole system, making it extremely expensive. This sounds more like a deployment mechanism to me, not like something I would use for live coding. - Vanessa - |
In reply to this post by KenDickey
Hi Ken, ... see below ... On 2021-03-18 16:28, [hidden email] wrote: > HI Gerald, > > Thanks for the feedback. > > I am aware of Virtual Function Tables, Sparse Array index compression, > lots of strategies for Dynamic Dispatch, many I have used to implement > various object systems. I suspected that you know better than me. I just explained the train of thought I follow when I muse about that problem. It always ends with: "Now you reinvented the wheel aka. VFTs" > > One can arrange the hash/ID of selectors, classes, and method tables. > > I am looking for something which is space compact, simple to explain, > runtime efficient, supports dynamic updates, and "fails softly" (an > "imperfect hash" still works in a Dictionary). Also, deals with > polymorphism and super and does not require an expensive #isSubclassOf: > test. I have a hunch that something like (in pseudo-smalltalk): receiver class caseOf: { [ Class1 ] -> [<send directly>] [ Class2 ] -> [ ... ] } otherwise: [ <just do the normal thing > ] will translate to byte code that executes quickly on an interpreter an can easily be (translated|jitted) to fast machine code. AFIR branch prediction for the resulting chain of jumps was much better than for indirect calls. I don't know if this is still the case with modern processors. Maybe Elliot can enlighten me here. I also imagine that this can be done easily with Sista. > > My thought with "Copy Down" is that while loading a bunch of code, one > can build simple method dictionaries, then apply a #deepRehash after the > load. This could be done to affected method dictionaries each time a > method is accepted during development as UI's (i.e. users) are slow. > > The #deepRehash for each method dictionary could use simple trials to > generate a compact table/array and set the hash for a method Dictionary > so that (<selector> hash) <logical-op> (<mtable> hash>) finds the > <selector>'s method without collisions. > > The #deepRehash only requires local knowledge (all used selectors are > known), not a global analysis to renumber classes or selectors. > > Perfect hash would be something competitive, speed wise, with PICs but > does not require cache management or inline code changes. With lookup > in only one table/dict one does not need to traverse the class hierarchy > on a miss, because lookup never misses (DNU in every table which is > answer on not found -- separate lookup if invoker wants nil instead). > > A bit of work to craft hashfun creation, but seems worthwhile. And an > interesting problem to work on. :) > > The idea seems simple enough that I suspect someone has looked at this > strategy. > > If this is a really bad idea, I don't have to spend any more thing > thinking about it or worse, actually doing something. ;^) > > Thanks again, > -KenD > > > On 2021-03-18 07:25, [hidden email] wrote: >> Hi Ken, >> >> I am by no means a VM expert, but this idea can >> be enhanced further: >> >> Let's generate consecutive >> numbers for all the symbols used as message selectors then this >> "perfect hash" degenerates >> to an array with pointers to the compiled >> method. Let's also ignore >> -- for the sake of simplicity -- >> symbols that are garbage collected; >> the holes occurring can simply be filled >> by some sort "free-list-algorithm. >> >> Now you have something the C++ programmer knows >> really well: A virtual function table. >> >> I have no clue whether those VFTs will help. >> >> >> Just my 0.01€. >> >> >> Best Regards, >> >> Gerald Best Regards, Gerald |
Free forum by Nabble | Edit this page |