Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.599.mcz ==================== Summary ==================== Name: Collections-mt.599 Author: mt Time: 19 January 2015, 9:11:48.572 am UUID: 48da0044-2d14-3048-b9f0-90a215344a36 Ancestors: Collections-ul.598 Implemented "First wins"-strategy wrt. to the order of associations. Allowed for simplifying the code. Order array does only grow to 75% to mitigate the greediness of the Dictionary. =============== Diff against Collections-ul.598 =============== Item was removed: - ----- Method: OrderedDictionary>>add: (in category 'adding') ----- - add: anAssociation - - | oldLastIndex oldCapacity | - oldLastIndex := lastIndex. - oldCapacity := self capacity. - - super add: anAssociation. - - (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [ - | index | - "The association was already present or we grew. We need to update the order." - index := self scanOrderFor: anAssociation key. - index = lastIndex ifFalse: [ - lastIndex := lastIndex + 1. - order at: lastIndex put: (order at: index). - order at: index put: nil. - lastIndex = order size ifTrue: [self fixEmptySlots]]]. - - ^ anAssociation! Item was removed: - ----- Method: OrderedDictionary>>at:put: (in category 'accessing') ----- - at: key put: anObject - - | oldLastIndex oldCapacity | - oldLastIndex := lastIndex. - oldCapacity := self capacity. - - super at: key put: anObject. - - (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [ - | index | - "The association was already present or we grew. We need to update the order." - index := self scanOrderFor: key. - index = lastIndex ifFalse: [ - lastIndex := lastIndex + 1. - order at: lastIndex put: (order at: index). - order at: index put: nil. - lastIndex = order size ifTrue: [self fixEmptySlots]]]. - - ^ anObject! Item was changed: ----- Method: OrderedDictionary>>atNewIndex:put: (in category 'private') ----- atNewIndex: index put: anObject + lastIndex = order size ifTrue: [ + self fixEmptySlots]. + lastIndex := lastIndex + 1. order at: lastIndex put: anObject. super atNewIndex: index put: anObject.! Item was changed: ----- Method: OrderedDictionary>>fixEmptySlots (in category 'private') ----- fixEmptySlots "Remove all nil slots in the order index to avoid overflow." + lastIndex = tally ifTrue: [^ self]. self fillOrderFrom: order.! Item was changed: ----- Method: OrderedDictionary>>growTo: (in category 'private') ----- growTo: anInteger | oldOrder | super growTo: anInteger. oldOrder := order. + "Grow only to 75%. See #atNewIndex:put: in HashedCollection." + order := self class arrayType new: (anInteger * (3/4)) ceiling. - order := self class arrayType new: anInteger. self fillOrderFrom: oldOrder.! Item was changed: ----- Method: OrderedDictionary>>initialize: (in category 'private') ----- initialize: n super initialize: n. + order := self class arrayType new: (n * (3/4)) ceiling. - order := self class arrayType new: n. lastIndex := 0.! Item was changed: ----- Method: OrderedDictionary>>removeKey:ifAbsent: (in category 'removing') ----- removeKey: key ifAbsent: aBlock + (self scanOrderFor: key) ifNotNil: [:index | + order at: index put: nil]. - order at: (self scanOrderFor: key) put: nil. ^ super removeKey: key ifAbsent: aBlock! Item was changed: ----- Method: OrderedDictionary>>scanOrderFor: (in category 'private') ----- scanOrderFor: anObject 1 to: lastIndex do: [:index | | element | ((element := order at: index) notNil and: [anObject = element key]) ifTrue: [^ index]]. + ^ nil! - lastIndex = order size - ifTrue: [self errorNoFreeSpace] - ifFalse: [^ order at: lastIndex + 1 "nil"].! |
At the moment, I vote against an implementation with linked associations as suggested by Levente [1] because:
- Naming reveals implementation detail (LinkedList resp. LinkedAssociation) and thus a LinkedDictionary could only be abstract like HashedCollection is - LinkedAssociation instances keep references (#next, #previous), which may interfere with automatic garbage collection if used outside the dictionary (see getter #associations) We could add automatic boxing/unboxing of linked elements (like Set does with SetElement) but this would nullify the performance gain of linked structures compared to arrays. Best, Marcel [1] http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz |
On Mon, 19 Jan 2015, Marcel Taeumel wrote:
> At the moment, I vote against an implementation with linked associations as > suggested by Levente [1] because: > > - Naming reveals implementation detail (LinkedList resp. LinkedAssociation) > and thus a LinkedDictionary could only be abstract like HashedCollection is I used this name to make it loadable into an updated Trunk image. > - LinkedAssociation instances keep references (#next, #previous), which may > interfere with automatic garbage collection if used outside the dictionary > (see getter #associations) > > We could add automatic boxing/unboxing of linked elements (like Set does > with SetElement) but this would nullify the performance gain of linked > structures compared to arrays. That's not true for removal, which still takes O(n) time with the current implementation: | n random array | n := 10000. random := Random seed: 36rSQUEAK. array := Array new: n streamContents: [ :stream | n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ]. { OrderedDictionary. LinkedDictionary } collect: [ :class | Smalltalk garbageCollect. [ | instance | instance := class new. instance addAll: array. array do: [ :each | instance removeKey: each key ] ] timeToRun ]. "==> #(892 10)" The current implementation can be modified to take only O(1) time average, by using another array, which maps the index of an association in the array variable to the index of the association in the order variable. Levente > > Best, > Marcel > > [1] http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800301.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
Good Idea. But how do we decide between time vs. memory? I just reduced "order" to be 25% smaller than "array" to save some memory because array starts growing anyway then.
Best, Marcel |
Woah, managing such a lookup index between "array" and "order" is quite complicated with the current implementation of Dictionary. :D
Best, Marcel |
I never said it was easy.
Another way to do it is to compact eagerly on removal (as it's done in Pharo). This gives better mass removal performance, and some of the code becomes significantly simpler. This results in further performance boost. Removal still takes O(n) time, which means mass removal is in O(n^2) time, but with a significantly smaller constant (~1/10), which improves the usability of the collection quite a bit: | n random array | n := 10000. random := Random seed: 36rSQUEAK. array := Array new: n streamContents: [ :stream | n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ]. { OrderedDictionary. LinkedDictionary } collect: [ :class | Smalltalk garbageCollect. [ | instance | instance := class new. instance addAll: array. array do: [ :each | instance removeKey: each key ] ] timeToRun ]. "==> #(78 10)" (If n is 100k, then the result is still pretty bad: #(7055 136).) You can find these changes in the Inbox. Levente On Mon, 19 Jan 2015, Marcel Taeumel wrote: > Woah, managing such a lookup index between "array" and "order" is quite > complicated with the current implementation of Dictionary. :D > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
In reply to this post by marcel.taeumel (old)
I never said it was easy.
Another way to do it is to compact eagerly on removal (as it's done in Pharo). This gives better mass removal performance, and some of the code becomes significantly simpler. This results in further performance boost. Removal still takes O(n) time, which means mass removal is in O(n^2), but with a significantly smaller constant (~1/10), which improves the usability of the collection quite a bit: | n random array | n := 10000. random := Random seed: 36rSQUEAK. array := Array new: n streamContents: [ :stream | n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ]. { OrderedDictionary. LinkedDictionary } collect: [ :class | Smalltalk garbageCollect. [ | instance | instance := class new. instance addAll: array. array do: [ :each | instance removeKey: each key ] ] timeToRun ]. "==> #(78 10)" (If n is 100k, then the result is still pretty bad: #(7055 136).) You can find these changes in the Inbox. Levente On Mon, 19 Jan 2015, Marcel Taeumel wrote: > Woah, managing such a lookup index between "array" and "order" is quite > complicated with the current implementation of Dictionary. :D > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
On Mon, Jan 19, 2015 at 11:32 AM, Levente Uzonyi <[hidden email]> wrote: I never said it was easy. In Spur, self become: self copyEmpty should be cheap. This takes about 18 usecs on my 2.2GHz i7. So at some point the become will be cheaper for removeAll. Levente best,
Eliot |
On Mon, 26 Jan 2015, Eliot Miranda wrote:
> > > On Mon, Jan 19, 2015 at 11:32 AM, Levente Uzonyi <[hidden email]> wrote: > I never said it was easy. > Another way to do it is to compact eagerly on removal (as it's done in > Pharo). This gives better mass removal performance, and some of the code > becomes significantly simpler. This results in further performance boost. > > Removal still takes O(n) time, which means mass removal is in O(n^2), but with a significantly smaller constant (~1/10), which improves the > usability of the collection quite a bit: > > | n random array | > n := 10000. > random := Random seed: 36rSQUEAK. > array := Array new: n streamContents: [ :stream | > n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ]. > { OrderedDictionary. LinkedDictionary } collect: [ :class | > Smalltalk garbageCollect. > [ > | instance | > instance := class new. > instance addAll: array. > array do: [ :each | instance removeKey: each key ] ] timeToRun ]. > "==> #(78 10)" > > (If n is 100k, then the result is still pretty bad: #(7055 136).) > > You can find these changes in the Inbox. > > > In Spur, self become: self copyEmpty should be cheap. This takes about 18 usecs on my 2.2GHz i7. So at some point the become will be cheaper for removeAll. using random access, while it takes O(1) time with the linked implementation. For #removeAll, if we want to go to the direction where we allocate new arrays (instead of replacing all the elements with nil - as it is done now), then I think assigning them to the instance variables will always be faster than #become:. 0.018ms sounds great for #become:. Levente > > Levente > > On Mon, 19 Jan 2015, Marcel Taeumel wrote: > > Woah, managing such a lookup index between "array" and "order" is quite > complicated with the current implementation of Dictionary. :D > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > > > > > > -- > best,Eliot > > |
On Mon, Jan 26, 2015 at 11:12 PM, Levente Uzonyi <[hidden email]> wrote: On Mon, 26 Jan 2015, Eliot Miranda wrote: Of course, that's much better. Simply replace the linked list with an empty one.
It's a key part of the design of Spur. Levente best,
Eliot |
Free forum by Nabble | Edit this page |