its a scratch implementation of dictionaries using linked list of associations.
dont care about names, care about numbers: (code to run taken from MaDictionary class comment) magma vs mine: time to add to sd: 161 time to add to md: 1865 time to access to sd: 124 time to access to md: 597 time to replace to sd: 114 time to replace to md: 676 time to remove 300 from sd: 2 time to remove 300 from md: 3568 -- sd - my dictionary md - MaWeakIdentityKeyDictionary smalltalk vs mine: -- time to add to sd: 159 time to add to md: 63178 time to access to sd: 147 time to access to md: 49052 time to replace to sd: 132 time to replace to md: 50736 time to remove 300 from sd: 2 -- sd - my dict md - smalltalk Dictionary. removing keys takes forever, so i stopped it before it printed a result.. |
Hi,
your work sounds interesting to me. Is your dictionary passing the tests given in DictionaryTest? If yes, why not opening a bug report on mantis and try to have it integrated into a future squeak release? Bye 2007/7/19, sig <[hidden email]>: > its a scratch implementation of dictionaries using linked list of associations. > dont care about names, care about numbers: > (code to run taken from MaDictionary class comment) > > magma vs mine: > > time to add to sd: 161 > time to add to md: 1865 > time to access to sd: 124 > time to access to md: 597 > time to replace to sd: 114 > time to replace to md: 676 > time to remove 300 from sd: 2 > time to remove 300 from md: 3568 > -- > sd - my dictionary > md - MaWeakIdentityKeyDictionary > > smalltalk vs mine: > -- > time to add to sd: 159 > time to add to md: 63178 > time to access to sd: 147 > time to access to md: 49052 > time to replace to sd: 132 > time to replace to md: 50736 > time to remove 300 from sd: 2 > -- > sd - my dict > md - smalltalk Dictionary. removing keys takes forever, so i stopped > it before it printed a result.. > > > > > -- Damien Cassou |
as i noted its a scratch.. didn't even tested it yet. :)
i want to adopt it for magma at first place. On 19/07/07, Damien Cassou <[hidden email]> wrote: > Hi, > > your work sounds interesting to me. Is your dictionary passing the > tests given in DictionaryTest? If yes, why not opening a bug report on > mantis and try to have it integrated into a future squeak release? > > Bye > > 2007/7/19, sig <[hidden email]>: > > its a scratch implementation of dictionaries using linked list of associations. > > dont care about names, care about numbers: > > (code to run taken from MaDictionary class comment) > > > > magma vs mine: > > > > time to add to sd: 161 > > time to add to md: 1865 > > time to access to sd: 124 > > time to access to md: 597 > > time to replace to sd: 114 > > time to replace to md: 676 > > time to remove 300 from sd: 2 > > time to remove 300 from md: 3568 > > -- > > sd - my dictionary > > md - MaWeakIdentityKeyDictionary > > > > smalltalk vs mine: > > -- > > time to add to sd: 159 > > time to add to md: 63178 > > time to access to sd: 147 > > time to access to md: 49052 > > time to replace to sd: 132 > > time to replace to md: 50736 > > time to remove 300 from sd: 2 > > -- > > sd - my dict > > md - smalltalk Dictionary. removing keys takes forever, so i stopped > > it before it printed a result.. > > > > > > > > > > > > > -- > Damien Cassou > > |
In reply to this post by Igor Stasenko
On Jul 19, 2007, at 5:10 AM, sig wrote: > its a scratch implementation of dictionaries using linked list of > associations. > dont care about names, care about numbers: > (code to run taken from MaDictionary class comment) Interesting... but another set of numbers that matter are memory usage stats. Can you provide a comparison of the memory used by each implementation after the tests are run? Also, you might want to look at the HashTable package on SqueakSource, which I suspect does something similar. Colin |
Im not really care about memory footprint.
Dictionaries using array of Association-s Association having 2 ivars. My Association add 3rd. So, comparing raw memory footprint of association objects it less than 1.5. But as it shows, that speed increased rougly by 1-2 factors (comparing to magma dicts) and by 3-4 factors comparing squeak current dicts. I think this is the case when speed over size is preferable :) On 19/07/07, Colin Putney <[hidden email]> wrote: > > On Jul 19, 2007, at 5:10 AM, sig wrote: > > > its a scratch implementation of dictionaries using linked list of > > associations. > > dont care about names, care about numbers: > > (code to run taken from MaDictionary class comment) > > Interesting... but another set of numbers that matter are memory > usage stats. Can you provide a comparison of the memory used by each > implementation after the tests are run? > > Also, you might want to look at the HashTable package on > SqueakSource, which I suspect does something similar. > > Colin > > |
In reply to this post by Damien Cassou-3
Damien Cassou wrote:
> your work sounds interesting to me. Is your dictionary passing the > tests given in DictionaryTest? If yes, why not opening a bug report on > mantis and try to have it integrated into a future squeak release? I'd recommend running your own benchmarks first. Claims like here: >> smalltalk vs mine: >> -- >> time to add to sd: 159 >> time to add to md: 63178 >> time to access to sd: 147 >> time to access to md: 49052 >> time to replace to sd: 132 >> time to replace to md: 50736 >> time to remove 300 from sd: 2 >> -- >> sd - my dict >> md - smalltalk Dictionary. removing keys takes forever, so i stopped >> it before it printed a result.. ... where BucketDictionary is supposedly 500 times faster in every regard (adding, accessing, replacing) sound to good to be true. I love 500x speed improvements just like the next guy but in my experience these don't just happen. Cheers, - Andreas |
Wait a bit more. I done with basic BDictionary class, now
adding Identity/Weak subclasses. All 22 tests which i found for Dictionary in my 3.10 image passed. On 19/07/07, Andreas Raab <[hidden email]> wrote: > Damien Cassou wrote: > > your work sounds interesting to me. Is your dictionary passing the > > tests given in DictionaryTest? If yes, why not opening a bug report on > > mantis and try to have it integrated into a future squeak release? > > I'd recommend running your own benchmarks first. Claims like here: > > >> smalltalk vs mine: > >> -- > >> time to add to sd: 159 > >> time to add to md: 63178 > >> time to access to sd: 147 > >> time to access to md: 49052 > >> time to replace to sd: 132 > >> time to replace to md: 50736 > >> time to remove 300 from sd: 2 > >> -- > >> sd - my dict > >> md - smalltalk Dictionary. removing keys takes forever, so i stopped > >> it before it printed a result.. > > ... where BucketDictionary is supposedly 500 times faster in every > regard (adding, accessing, replacing) sound to good to be true. I love > 500x speed improvements just like the next guy but in my experience > these don't just happen. > > Cheers, > - Andreas > > |
In reply to this post by Igor Stasenko
sig wrote:
> its a scratch implementation of dictionaries using linked list of > associations. > dont care about names, care about numbers: > (code to run taken from MaDictionary class comment) What values were stored in the dictionaries when you run the benchmarks? Thanks, Andres. |
On 19/07/07, Andres Valloud <[hidden email]> wrote:
> sig wrote: > > its a scratch implementation of dictionaries using linked list of > > associations. > > dont care about names, care about numbers: > > (code to run taken from MaDictionary class comment) > What values were stored in the dictionaries when you run the benchmarks? > > Thanks, > Andres. > > both dictionaries used in comparison tests populated simply using just: obj := Object new. dict at: obj put: obj. |
sig wrote:
> look at MaDictionary class comment. > > both dictionaries used in comparison tests populated simply using just: > > obj := Object new. > dict at: obj put: obj. I don't have a Squeak image handy right now, but I'd suggest checking these cases out... Time it takes to add all strings in the image to a dictionary. Time it takes to add all symbols in the image to a dictionary. Time it takes to add 100k consecutive integers as keys of a dictionary. Thanks, Andres. |
In reply to this post by Igor Stasenko
On Jul 19, 2007, at 5:10 AM, sig wrote:
> its a scratch implementation of dictionaries using linked list of > associations. I think the two 'at:ifAbsent:' methods neglect to send #value to the errorBlock? Otherwise the lookup performance ain't too shabby for relatively small tallies. (For anyone who can decode this sentence: BucketDict is almost as fast as Pepsi's SlotDictionary when used as the basis for lexical environments in the Jolt compiler. This is my usual 'quasi-realistic' benchmark for a Dictionary.) One classic optimisation would be to use an ArrayedCollection of some kind (instead of a linked list) for the buckets. When #findKeyOrNil: hits and the association is not the first element in the bucket then exchange it with the element immediately preceding it. Associations related to popular keys will gradually migrate to the front of their buckets. Cheers, Ian |
In reply to this post by Andres Valloud-3
Andres Valloud a écrit :
> sig wrote: >> look at MaDictionary class comment. >> >> both dictionaries used in comparison tests populated simply using just: >> >> obj := Object new. >> dict at: obj put: obj. > I don't have a Squeak image handy right now, but I'd suggest checking > these cases out... > > Time it takes to add all strings in the image to a dictionary. > > Time it takes to add all symbols in the image to a dictionary. > > Time it takes to add 100k consecutive integers as keys of a dictionary. > > Thanks, > Andres. > sig's BucketDictionary is optimized in the case of Poorly hashed keys. Strings, Symbols and integers have rather good hash in Squeak, and in this case sig's BucketDictionary is slower than regular Dictionary. Ian Piumarta a écrit : > > One classic optimisation would be to use an ArrayedCollection of some > kind (instead of a linked list) for the buckets. When #findKeyOrNil: > hits and the association is not the first element in the bucket then > exchange it with the element immediately preceding it. Associations > related to popular keys will gradually migrate to the front of their > buckets. > > Cheers, > Ian > Ian, very bright to swap popular toward head. But the Bucket efficiency in adding/removing comes from linked list. How a copy of a bucket array would do better? Maybe you mean avoiding writing something ugly like: BucketDictionary>>findKeyOrNil: key | index bucket prevprev prev next | index := (key hash \\ array size) + 1. bucket := array at: index. prev := prevprev := nil. next := bucket. [next key = key ifTrue: [ prev ifNotNil: [prev next: next next. next next: prev. prevprev ifNil [array at: index put: next] ifNotNil: [prevprev next: next]]. ^next]. prevprev := prev. prev := next. next := next next. next isNil] whileFalse. ^nil Argh! better start learning lisp next time... I didn't test it. If it works, someone pay me a beer! Nicolas |
In reply to this post by Igor Stasenko
did you check the hashtable implementation on squeaksource.
t is better than Squeak dictionary since it does lineary degrades. in your tests did you try to have large dictionaries? Because this has also an impact on the performance. Stef On 19 juil. 07, at 14:10, sig wrote: > its a scratch implementation of dictionaries using linked list of > associations. > dont care about names, care about numbers: > (code to run taken from MaDictionary class comment) > > magma vs mine: > > time to add to sd: 161 > time to add to md: 1865 > time to access to sd: 124 > time to access to md: 597 > time to replace to sd: 114 > time to replace to md: 676 > time to remove 300 from sd: 2 > time to remove 300 from md: 3568 > -- > sd - my dictionary > md - MaWeakIdentityKeyDictionary > > smalltalk vs mine: > -- > time to add to sd: 159 > time to add to md: 63178 > time to access to sd: 147 > time to access to md: 49052 > time to replace to sd: 132 > time to replace to md: 50736 > time to remove 300 from sd: 2 > -- > sd - my dict > md - smalltalk Dictionary. removing keys takes forever, so i stopped > it before it printed a result.. > <BucketDictionary.st> > <DictBucket.st> > |
In reply to this post by Ian Piumarta-2
On 19/07/07, Ian Piumarta <[hidden email]> wrote:
> On Jul 19, 2007, at 5:10 AM, sig wrote: > > > its a scratch implementation of dictionaries using linked list of > > associations. > > I think the two 'at:ifAbsent:' methods neglect to send #value to the > errorBlock? > no they dont :) its just a shortcut. instead of putting: ^ bucket ifNil: [ aBlock value ] i can put: ^ bucket ifNil: aBlock since IfNil expects block as argument, #value is sent to aBlock anyways. No reason to encapsulate it into another block. > Otherwise the lookup performance ain't too shabby for relatively > small tallies. (For anyone who can decode this sentence: BucketDict > is almost as fast as Pepsi's SlotDictionary when used as the basis > for lexical environments in the Jolt compiler. This is my usual > 'quasi-realistic' benchmark for a Dictionary.) > > One classic optimisation would be to use an ArrayedCollection of some > kind (instead of a linked list) for the buckets. When #findKeyOrNil: > hits and the association is not the first element in the bucket then > exchange it with the element immediately preceding it. Associations > related to popular keys will gradually migrate to the front of their > buckets. #findKeyOrNil: key withPrevious: aBlock so, when it finds association with corresponding key, it sends #value: to aBlock with previous association in linked list just before returning a result. Then it can be easily reordered. But i'm not sure if this needed at all. Dicts are very sensitive to rehash frequencies. in given sample: | sd md r | Transcript clear. r := OrderedCollection new. ByteString allInstancesDo: [:each | r add: each ]. sd := BDictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to bd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. sd := Dictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to sd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. i got following results: Making 'tally >= (sz *4)' in #checkForOverflow, i got: time to add all strings to bd: 3184 time to add all strings to sd: 3010 Making 'tally >= (sz )' in #checkForOverflow, i got: time to add all strings to bd: 6286 time to add all strings to sd: 2991 Making tally >= sz * 2 time to add all strings to bd: 3162 time to add all strings to sd: 2985 And please , note i made these dicts mainly for replacing Magma ones, which working with huge sets and poorly hashed keys taken IdentityDictionaries: time to add all strings to bd: 4146 time to add all strings to sd: 7151 PS. In attachment new version.. I will put it into mantis too. Collections-Unordered-FastDicts.st (38K) Download Attachment |
In reply to this post by Nicolas Cellier-3
On Jul 19, 2007, at 1:14 PM, nicolas cellier wrote:
> > exchange it with the element immediately preceding it. Associations > > related to popular keys will gradually migrate to the front of their > > buckets. > But the Bucket efficiency in adding/removing comes from linked list. > How a copy of a bucket array would do better? You're right, of course. An ArrayedCollection only makes sense if the ratio of removal to other operations is small. A doubly-linked list perhaps? ;-) Cheers, Ian |
In reply to this post by Andres Valloud-3
On 19/07/07, Andres Valloud <[hidden email]> wrote:
> sig wrote: > > look at MaDictionary class comment. > > > > both dictionaries used in comparison tests populated simply using just: > > > > obj := Object new. > > dict at: obj put: obj. > I don't have a Squeak image handy right now, but I'd suggest checking > these cases out... > > Time it takes to add all strings in the image to a dictionary. > > Time it takes to add all symbols in the image to a dictionary. for symbols: time to add all strings to bd: 467 time to add all strings to sd: 998 > > Time it takes to add 100k consecutive integers as keys of a dictionary. > | sd md r | Transcript clear. r := OrderedCollection new. 1 to: 100000 do: [:each | r add: each ]. sd := BDictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to bd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. sd := Dictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to sd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. time to add all strings to bd: 719 time to add all strings to sd: 870 | sd md r | Transcript clear. r := OrderedCollection new. 1 to: 100000 do: [:each | r add: each ]. sd := IdentityBDictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to bd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. sd := IdentityDictionary new. Smalltalk garbageCollect. Transcript cr; show: 'time to add all strings to sd: ', [ r do: [:each | sd at: each put: each ] ] timeToRun printString. time to add all strings to bd: 719 time to add all strings to sd: 1580 > Thanks, > Andres. > > > |
In reply to this post by stephane ducasse
stephane ducasse a écrit :
> did you check the hashtable implementation on squeaksource. > t is better than Squeak dictionary since it does lineary degrades. > in your tests did you try to have large dictionaries? Because this has > also an > impact on the performance. > > Stef > sig's work is essentially same as HashTable... Except some details: HashTable has lot of message indirections to ease subclassing (Weak, Identity...) HashTable stores hash code instead of re-computing (memory/speed compromise) HashTable optimize link list enumeration with [... (item := item next) isNil] whileFalse loop rather than using recursive message send Nicolas |
On 20/07/07, nicolas cellier <[hidden email]> wrote:
> stephane ducasse a écrit : > > did you check the hashtable implementation on squeaksource. > > t is better than Squeak dictionary since it does lineary degrades. > > in your tests did you try to have large dictionaries? Because this has > > also an > > impact on the performance. > > > > Stef > > > > sig's work is essentially same as HashTable... > > Except some details: > HashTable has lot of message indirections to ease subclassing (Weak, > Identity...) > HashTable stores hash code instead of re-computing (memory/speed compromise) > HashTable optimize link list enumeration with [... > (item := item next) isNil] whileFalse loop rather than using recursive > message send > Interesting, why its not included to kernel classes? > Nicolas > > > |
>> sig's work is essentially same as HashTable...
>> >> Except some details: >> HashTable has lot of message indirections to ease subclassing (Weak, >> Identity...) >> HashTable stores hash code instead of re-computing (memory/speed >> compromise) >> HashTable optimize link list enumeration with [... >> (item := item next) isNil] whileFalse loop rather than using >> recursive >> message send >> > > Interesting, why its not included to kernel classes? it would be good to have a nice package with the different implementation. I think that speeding up Dictionary would also have an impact on the overall performance of squeak (to be verified). Now we never got the energy to push that. But if someone want to have a look this would be great. Stref |
On Jul 20, 2007, at 2:10 AM, stephane ducasse wrote: >>> sig's work is essentially same as HashTable... >> >> Interesting, why its not included to kernel classes? > > it would be good to have a nice package with the different > implementation. > I think that speeding up Dictionary would also have an impact on > the overall > performance of squeak (to be verified). Now we never got the energy > to push that. > But if someone want to have a look this would be great. That someone should look at memory usage as well as speed. HashTable uses significantly more memory than a standard Dictionary. Sometimes it's a good idea to trade space for speed, but not always. Colin |
Free forum by Nabble | Edit this page |