I have been looking at MethodDictionary.
It seems to have very little to do with Methods. It seems to me that a better design would have been to have a class VarDictionary which does not use Associations (similar to MethodDictionary). MethodDictionary would inherit from VarDictionary and add the couple of additional methods it needs. VarDictionary would then be available for others to use. Better still would be to have MethodDictionary not be a variable class at all but simply use an additional variable "values". Again MethodDictionary should really inherit from a class that provides this service. I would use this service in my own code. Ralph Boland |
2009/7/20 Ralph Boland <[hidden email]>:
> I have been looking at MethodDictionary. > It seems to have very little to do with Methods. > It seems to me that a better design would have been > to have a class VarDictionary which does not use > Associations (similar to MethodDictionary). > MethodDictionary would inherit from VarDictionary > and add the couple of additional methods it needs. > VarDictionary would then be available for others to use. > > Better still would be to have MethodDictionary not > be a variable class at all but simply use an additional > variable "values". Again MethodDictionary should > really inherit from a class that provides this service. > I would use this service in my own code. > accessing the methods by VM. Don't forget, that MethodDictionary having such special format only because to satisfy VM. > Ralph Boland > > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Ralph Boland
Hi Ralph,
On Sun, Jul 19, 2009 at 10:17 PM, Ralph Boland <[hidden email]> wrote: I have been looking at MethodDictionary. Actually I think the best representation for MethodDictionary is as a flat pair-wise sequence of selector and method, sorted by selector identityHash. Advantages: One saves significant space. There is one less object per method dictionary (no values array) and no empty space in the sequence. We shrank the image by about 8% when we did this for VisualWorks.
Small method dictionaries (say size <= 16 entries) are linearly scanned without needing to fetch the identityHash of the selector. This is faster for small dictionaries because the cost of three memory references in different places (selector's hash, dictionary's selector sequence, dictionary's method sequence) is higher on modern machines than the one scan of the dictionary's contents sequence.
nil looks ot the VM like an empty MethodDictionary so classes with no methods (there are quite a few) don't need a dictionary object at all. Disadvantages: Large dictionaries are looked up using binary search, which can be slower, because one has to fetch the hash of the selector being probed for to comp[are it with the message selector's hash. But this isn't the common case.
Dictionaries must be grown whenever a selector/method par is added or shrunk whenever one is removed, but this is rare (only happens when compiling a new method or removing an old) and can be e.g. batched if creating a new dictionary (e.g. if loading code form some binary stream).
Something we didn't yet do in VisualWorks is to make the identityHash of a Symbol dependent on its contents. If this is done then within images with the same hash width dictionaries will always have the same order because selectors will have the same hashes, which means loading binary code can be faster since the order of the dictionary on disc is the same as in memory.
I plan to implement the above when doing the new image format work later this year. 2¢ best Eliot
|
As measured directly on HPS, you can do linear search in method
dictionaries up to 64 entries in size. Only after 64 entries does binary search pay off. This is consistent with the numbers derived by the Quake folks. The issue with binary search is that each left/right branch behaves at random for random selector lookups, and is thus unpredictable. Also, linear search works great with cache line reads, whereas random reads in the dictionary's key array do not (basically, only one key is used per cache line read mod the overlap, etc). You can get another ~2x or so speedup in lookup speed by making method dictionaries an actual hashed collection with well-chosen prime sizes. Then, you use % in the C method lookup. As far as I could see, it does not seem to matter much that the VM uses % because the advantage is that using % with a prime table size (as opposed to & and a power of two table size) ensures good distribution properties. Consequently, the vast majority of selectors are found using just one probe attempt. IME, the first probe should be outside the scanning loops because some compilers produce better code that way. IIRC, the cost I measured for a biggish VW image was about 400kb. This would be a bit lower than the 8% reported by Eliot (but back then I am sure images were considerably smaller). Andres. PS: I am not exactly sure re: 2x, but I am certain it was very significant with: | worstCase | worstCase := Object withAllSubclasses reversed. worstCase do: [:each | each yourself] The run time was way lower with hashed method dictionaries. If it was not 2x faster, it was 60% faster... that kind of number. The implementation I did used the first slot of the method dictionary to tag which kind of method dictionary it is. If it's the small integer zero, then it's a hashed method dictionary. Otherwise, it's the sorted collection kind. Then you can mix/match method dictionaries all you want on the image side. Converting all the method dictionaries from "compact" to "sparse" using become: took about 0.4s. I used a load factor target of about 0.75, IIRC... Eliot Miranda wrote: > Hi Ralph, > > On Sun, Jul 19, 2009 at 10:17 PM, Ralph Boland <[hidden email]<mailto:[hidden email]>> wrote: > I have been looking at MethodDictionary. > It seems to have very little to do with Methods. > It seems to me that a better design would have been > to have a class VarDictionary which does not use > Associations (similar to MethodDictionary). > MethodDictionary would inherit from VarDictionary > and add the couple of additional methods it needs. > VarDictionary would then be available for others to use. > > Better still would be to have MethodDictionary not > be a variable class at all but simply use an additional > variable "values". Again MethodDictionary should > really inherit from a class that provides this service. > I would use this service in my own code. > > Ralph Boland > > Actually I think the best representation for MethodDictionary is as a flat pair-wise sequence of selector and method, sorted by selector identityHash. > > Advantages: > > One saves significant space. There is one less object per method dictionary (no values array) and no empty space in the sequence. We shrank the image by about 8% when we did this for VisualWorks. > > Small method dictionaries (say size <= 16 entries) are linearly scanned without needing to fetch the identityHash of the selector. This is faster for small dictionaries because the cost of three memory references in different places (selector's hash, dictionary's selector sequence, dictionary's method sequence) is higher on modern machines than the one scan of the dictionary's contents sequence. > > nil looks ot the VM like an empty MethodDictionary so classes with no methods (there are quite a few) don't need a dictionary object at all. > > Disadvantages: > Large dictionaries are looked up using binary search, which can be slower, because one has to fetch the hash of the selector being probed for to comp[are it with the message selector's hash. But this isn't the common case. > > Dictionaries must be grown whenever a selector/method par is added or shrunk whenever one is removed, but this is rare (only happens when compiling a new method or removing an old) and can be e.g. batched if creating a new dictionary (e.g. if loading code form some binary stream). > > > Something we didn't yet do in VisualWorks is to make the identityHash of a Symbol dependent on its contents. If this is done then within images with the same hash width dictionaries will always have the same order because selectors will have the same hashes, which means loading binary code can be faster since the order of the dictionary on disc is the same as in memory. > > I plan to implement the above when doing the new image format work later this year. > > 2¢ > > best > Eliot > > |
Free forum by Nabble | Edit this page |