Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.593.mcz ==================== Summary ==================== Name: Collections-mt.593 Author: mt Time: 15 January 2015, 5:14:16.888 pm UUID: 0316ee3c-b9ec-f845-9474-8df35a929d8b Ancestors: Collections-mt.592 New ordered dictionary added, which keeps track of the insertion order. =============== Diff against Collections-mt.592 =============== Item was added: + Dictionary subclass: #OrderedDictionary + instanceVariableNames: 'order lastIndex' + classVariableNames: '' + poolDictionaries: '' + category: 'Collections-Sequenceable'! + + !OrderedDictionary commentStamp: '<historical>' prior: 0! + I am an ordered dictionary. I have an additional index (called 'order') to keep track of the insertion order of my associations. + + The read access is not affected by the additional index. + + Storing new data updates the index in O(1) for new keys. Keys that are already present involve actions in O(n) to update the insertion order. + + The growth operation compacts the index and takes O(n) additional time.! Item was added: + ----- Method: OrderedDictionary>>add: (in category 'adding') ----- + add: anAssociation + + | oldLastIndex oldCapacity | + oldLastIndex := lastIndex. + oldCapacity := self capacity. + + super add: anAssociation. + + (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [ + | index | + "The association was already present or we grew. We need to update the order." + index := self scanOrderFor: anAssociation key. + index = lastIndex ifFalse: [ + lastIndex := lastIndex + 1. + index := self scanOrderFor: anAssociation key. + order at: lastIndex put: (order at: index). + order at: index put: nil. + lastIndex = order size ifTrue: [self fixEmptySlots]]]. + + ^ anAssociation! Item was added: + ----- Method: OrderedDictionary>>associationsDo: (in category 'enumerating') ----- + associationsDo: aBlock + "Iterate over the order instead of the internal array." + + lastIndex = 0 ifTrue: [^ self]. + 1 to: lastIndex do: [:index | + (order at: index) ifNotNil: [:element | + aBlock value: element]].! Item was added: + ----- Method: OrderedDictionary>>at:put: (in category 'accessing') ----- + at: key put: anObject + + | oldLastIndex oldCapacity | + oldLastIndex := lastIndex. + oldCapacity := self capacity. + + super at: key put: anObject. + + (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [ + | index | + "The association was already present or we grew. We need to update the order." + index := self scanOrderFor: key. + index = lastIndex ifFalse: [ + lastIndex := lastIndex + 1. + order at: lastIndex put: (order at: index). + order at: index put: nil. + lastIndex = order size ifTrue: [self fixEmptySlots]]]. + + ^ anObject! Item was added: + ----- Method: OrderedDictionary>>atNewIndex:put: (in category 'private') ----- + atNewIndex: index put: anObject + + lastIndex := lastIndex + 1. + order at: lastIndex put: anObject. + + super atNewIndex: index put: anObject.! Item was added: + ----- Method: OrderedDictionary>>fillOrderFrom: (in category 'private') ----- + fillOrderFrom: anArray + + | arraySize | + arraySize := lastIndex. + lastIndex := 0. + 1 to: arraySize do: [:index | + (anArray at: index) ifNotNil: [:object | + lastIndex := lastIndex + 1. + order at: lastIndex put: object]].! Item was added: + ----- Method: OrderedDictionary>>fixEmptySlots (in category 'private') ----- + fixEmptySlots + "Remove all nil slots in the order index to avoid overflow." + + self fillOrderFrom: order.! Item was added: + ----- Method: OrderedDictionary>>growTo: (in category 'private') ----- + growTo: anInteger + + | oldOrder | + super growTo: anInteger. + oldOrder := order. + order := self class arrayType new: anInteger. + self fillOrderFrom: oldOrder.! Item was added: + ----- Method: OrderedDictionary>>initialize: (in category 'private') ----- + initialize: n + + super initialize: n. + order := self class arrayType new: n. + lastIndex := 0.! Item was added: + ----- Method: OrderedDictionary>>removeKey:ifAbsent: (in category 'removing') ----- + removeKey: key ifAbsent: aBlock + + order at: (self scanOrderFor: key) put: nil. + ^ super removeKey: key ifAbsent: aBlock! Item was added: + ----- Method: OrderedDictionary>>scanOrderFor: (in category 'private') ----- + scanOrderFor: anObject + + 1 to: lastIndex do: [:index | + | element | + ((element := order at: index) notNil and: [anObject = element key]) + ifTrue: [^ index]]. + + lastIndex = order size + ifTrue: [self errorNoFreeSpace] + ifFalse: [^ self order at: lastIndex + 1 "nil"].! |
Well, it was just a little programming exercise for me. Let's discuss whether you like it or not. ;) I did need such a container several times in the past. Maybe there is a reason why Squeak did not have it? :)
Best, Marcel |
I find the OrderedDictionary implementation similar to the one in Pharo,
and therefore sub-optimal. The order variable is only used to iterate over the associations in insertion/update order. Such behavior can achieved with a LinkedDictionary implementation, which provides O(1) average time for all lookup-based operations, and can also support different behaviors (order by last access instead of order by insertion/update) within the same time constraints. When I implement such linked dictionaries (see LRUCache, or OCompletition for examples), I usually link the values, because that way I can use any dictionary. For a general purpose linked dictionary, the links can be added to the associations to hide the details from the user. Other than this, I found the following issues with the code after a quick look: - #scanOrderFor: sends #order, which is not implemented. - #add: evaluates [index := self scanOrderFor: anAssociation key.] twice. The second evaluation seems to be unnecessary. - The class comment doesn't state it everywhere that the big O notation applies to time, not space. Can you give me some examples when/where you needed an ordered dictionary? Levente On Thu, 15 Jan 2015, Marcel Taeumel wrote: > Well, it was just a little programming exercise for me. Let's discuss whether > you like it or not. ;) I did need such a container several times in the > past. Maybe there is a reason why Squeak did not have it? :) > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799754.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
Hi Levente,
thank you for the quick code review. I did some fixes. :) For the LinkedDictionary: When do you establish the desired order (insertion, lexical, ... someBlock) for the associations? Use cases are similar to OrderedCollection. But instead of accessing items via index (someItems at: 5), you can access it with a more descriptive key (e.g., someItems at: #foobar). I will look for a more concrete one. To be better comparable with OrderedCollection, there might be an additional protocol to support #atFirst and others... Hmmm... Here are more references to the concept itself: https://www.python.org/dev/peps/pep-0372/ https://docs.python.org/3/library/collections.html#collections.OrderedDict Best, Marcel |
On 16.01.2015, at 09:39, Marcel Taeumel <[hidden email]> wrote:
> > https://www.python.org/dev/peps/pep-0372/ Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente). I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end. - Bert - smime.p7s (5K) Download Attachment |
Hmm... you're right. It seems that in C#, it is just an overwrite, too. Which could imply no update of the order.
http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.add%28v=vs.100%29.aspx I cannot verify that right now. However, this would really make the implementation simpler and maybe more flexible. Still, I have to look into this linking of values. Did not get it yet... :) Best, Marcel |
In reply to this post by marcel.taeumel (old)
On Fri, 16 Jan 2015, Marcel Taeumel wrote:
> Hi Levente, > > thank you for the quick code review. I did some fixes. :) > > For the LinkedDictionary: When do you establish the desired order > (insertion, lexical, ... someBlock) for the associations? I'm not sure if I understand this question right, so I'll try do my best to give a meaningful answer. If the goal is to keep the insertion order, then #atNewIndex:put: is the right place. If you want it to reflect the order of modifications (insert + update), then #add: and #at:put: has to be modified too. If you want the order to reflect the order of accesses, then all other methods have to be changed which access the value (or the key?). For the list implementation, I always prefer the circular doubly-linked list with a head element (Not sure if this is the right term in english. It means a separate element with no value, just the next and the previous links. This allows the code to be simpler, because the list always has at least one element). This list provides O(1) time removal, and insertion next to the head (or any other known element). This is why any kind of ordering can be implemented with O(1) extra time, as long as the elements are moved to the front. The only drawback this dictionary implementation has is that the associations added with #add: can't be used, unless they are from the right class (the one which has the two links). In Java this collection is called LinkedHashMap: http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html This supports two kind of ordering: access-order and insertion-order. Levente > > Use cases are similar to OrderedCollection. But instead of accessing items > via index (someItems at: 5), you can access it with a more descriptive key > (e.g., someItems at: #foobar). I will look for a more concrete one. > > To be better comparable with OrderedCollection, there might be an additional > protocol to support #atFirst and others... Hmmm... > > Here are more references to the concept itself: > https://www.python.org/dev/peps/pep-0372/ > https://docs.python.org/3/library/collections.html#collections.OrderedDict > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799843.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
Ah, okay. I was curious about the performance of iterating over an array vs. the linked list:
a := Array new: 1000. l := LinkedList new. 1000 timesRepeat: [l add: (StackLink with: nil)]. [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" I expected the opposite! :-O Why is that so? Best, Marcel |
In reply to this post by marcel.taeumel (old)
On Fri, 16 Jan 2015, Marcel Taeumel wrote:
> Hmm... you're right. It seems that in C#, it is just an overwrite, too. Which > could imply no update of the order. > > http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.add%28v=vs.100%29.aspx > > I cannot verify that right now. However, this would really make the > implementation simpler and maybe more flexible. > > Still, I have to look into this linking of values. Did not get it yet... :) I've uploaded a quick implementation, which uses the insertion order. http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz Levente > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799882.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > > |
In reply to this post by Bert Freudenberg
On 16.01.2015, at 12:13, Bert Freudenberg <[hidden email]> wrote: > On 16.01.2015, at 09:39, Marcel Taeumel <[hidden email]> wrote: >> >> https://www.python.org/dev/peps/pep-0372/ > > Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente). > > I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end. > ACK. See other mail. Best -Tobias > - Bert - signature.asc (1K) Download Attachment |
In reply to this post by Bert Freudenberg
On Jan 16, 2015, at 3:13 AM, Bert Freudenberg <[hidden email]> wrote:
> On 16.01.2015, at 09:39, Marcel Taeumel <[hidden email]> wrote: >> >> https://www.python.org/dev/peps/pep-0372/ > > Interesting. This states that the order is never updated: when an existing element is overwritten, it is not moved to the end. Only new keys are appended. This allows for a more efficient implementation (such as suggested by Levente). > > I think an OrderedDictionary would most often be used to preserve the original order of keys. Assigning a new value should not move the item to the end. +1 |
In reply to this post by marcel.taeumel (old)
Hi Marcel,
On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote: > Ah, okay. I was curious about the performance of iterating over an array vs. > the linked list: > > a := Array new: 1000. > l := LinkedList new. > > 1000 timesRepeat: [l add: (StackLink with: nil)]. > > [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" > > [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" > > I expected the opposite! :-O Why is that so? Look at the implementations of do:. In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element. In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element. > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > |
> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote: > > Hi Marcel, > > On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote: > >> Ah, okay. I was curious about the performance of iterating over an array vs. >> the linked list: >> >> a := Array new: 1000. >> l := LinkedList new. >> >> 1000 timesRepeat: [l add: (StackLink with: nil)]. >> >> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" >> >> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" >> >> I expected the opposite! :-O Why is that so? > > Look at the implementations of do:. > > In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element. > > In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element. My guess would be that the two inst var reads are way more efficient than the at: primitive. - Bert - smime.p7s (5K) Download Attachment |
In reply to this post by Eliot Miranda-2
Hi All,
On Jan 16, 2015, at 7:36 AM, Eliot Miranda <[hidden email]> wrote: > Hi Marcel, > > On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote: > >> Ah, okay. I was curious about the performance of iterating over an array vs. >> the linked list: >> >> a := Array new: 1000. >> l := LinkedList new. >> >> 1000 timesRepeat: [l add: (StackLink with: nil)]. >> >> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" >> >> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" >> >> I expected the opposite! :-O Why is that so? > > Look at the implementations of do:. > > In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element. > > In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element. This is interesting. (Marcel, Chris, forgive me; I'm presuming; please don't take this personally). Marcel above appears to lack an intuition about the structure of Array vs LinkedList. And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers. As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head. Those pictures are key to my approach to design and optimization. I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures. I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments. Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment. A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration. Either approach would make an interesting project, yes? >> Best, >> Marcel >> >> >> >> -- >> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html >> Sent from the Squeak - Dev mailing list archive at Nabble.com. >> |
On Jan 16, 2015, at 7:55 AM, Eliot Miranda <[hidden email]> wrote: > Hi All, > > On Jan 16, 2015, at 7:36 AM, Eliot Miranda <[hidden email]> wrote: > >> Hi Marcel, >> >> On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote: >> >>> Ah, okay. I was curious about the performance of iterating over an array vs. >>> the linked list: >>> >>> a := Array new: 1000. >>> l := LinkedList new. >>> >>> 1000 timesRepeat: [l add: (StackLink with: nil)]. >>> >>> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" >>> >>> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" >>> >>> I expected the opposite! :-O Why is that so? >> >> Look at the implementations of do:. >> >> In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element. >> >> In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element. > > This is interesting. (Marcel, Chris, forgive me; I'm presuming; please don't take this personally). Marcel above appears to lack an intuition about the structure of Array vs LinkedList. And in developing a hash algorithm for a 32-bit subset of Floats a few weeks ago Chris appeared to lack an I tuition about Floats being boxed, assuming they were value types, not containers. > > As a VM implementer I carry around a clear picture (literally, I am a visual thinker) of objects in my head. Those pictures are key to my approach to design and optimization. > and lest anyone think I have no problem developing these pictures, elsewhere in the thread Levente discussed the linked hash map organization of OrderedDictionary. It took me the few minutes in which I drafted an erroneous and unsent email to develop the picture that made me realize Levente was right about lookup being O(1) because I didn't have the picture of a link in my head when I started writing the email. > I presume that for someone approaching the system given only a textual description of object structures, through class comments and method source it is difficult to develop a good picture or mental model. For me, I read the blue book first, which is replete with pictures. > > I know that historically visual inspector frameworks such as Jun have been able to auto-generate pictorial representations of specific object graphs. I wonder how useful it would be to provide support for designers to include pictorial representations in class comments. > > Again I presume that the text model would have to support inclusion of simple bitmaps (to avoid having to include a full drawing framework in the system) and that the designer would construct a sample graph, generate a diagram using a visual inspector framework using eg Jun, render it to a bitmap and include it in the class comment. > > A more elaborate system could of course include the sample graph and render it dynamically, that would allow exploration. > > Either approach would make an interesting project, yes? > >>> Best, >>> Marcel >>> >>> >>> >>> -- >>> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799893.html >>> Sent from the Squeak - Dev mailing list archive at Nabble.com. >>> |
Ah, I just thought of an array as being some contiguous memory area. Mea culpa. :)
@ Levente: Thanks for the implementation of LinkedDictionary! Now I see, what you meant. The links are still an addition to the array of Dictionary. I misunderstood that part. Best, Marcel |
In reply to this post by Bert Freudenberg
On Jan 16, 2015, at 7:53 AM, Bert Freudenberg <[hidden email]> wrote: > >> On 16.01.2015, at 16:36, Eliot Miranda <[hidden email]> wrote: >> >> Hi Marcel, >> >> On Jan 16, 2015, at 5:54 AM, Marcel Taeumel <[hidden email]> wrote: >> >>> Ah, okay. I was curious about the performance of iterating over an array vs. >>> the linked list: >>> >>> a := Array new: 1000. >>> l := LinkedList new. >>> >>> 1000 timesRepeat: [l add: (StackLink with: nil)]. >>> >>> [a do: [:ea | 1 + 2]] bench. "'94,600 per second.'" >>> >>> [l do: [:ea | 1 + 2]] bench. "'159,000 per second.'" >>> >>> I expected the opposite! :-O Why is that so? >> >> Look at the implementations of do:. >> >> In the Array case we iterate over the indices using SmallIntegers (fast arithmetic with no allocation) and dereference each element using the Object>>at: primitive, which (apart from type and bounds checking) is simple indexed memory access. So one read per element. >> >> In the LinkedList case each list element is fetched from the previous element and each value fetched from the element, so two reads per element. > > Making it all the more surprising that the linked list is twice as fast. Ugh. I was blindsided. I saw the numbers as times not rates. Marcel, I'm sorry. OK indeed the difference would be the cost of the bounds check in each array access. Interesting to see at what point the poorer memory locality of the list makes it worse. Difficult to measure given the locality is decided by history and the GC. > > My guess would be that the two inst var reads are way more efficient than the at: primitive. > > - Bert - > > > |
Okay, the Array >> #do: comes down to Number's:
to: stop do: aBlock "Normally compiled in-line, and therefore not overridable. Evaluate aBlock for each element of the interval (self to: stop by: 1)." | nextValue | nextValue := self. [nextValue <= stop] whileTrue: [aBlock value: nextValue. nextValue := nextValue + 1] And the LinkedList >> #do: is: do: aBlock | aLink | aLink := firstLink. [aLink == nil] whileFalse: [aBlock value: aLink. aLink := aLink nextLink] So Array iteration has the incrementation of 1 in each iteration. That makes the difference? I thought that #to:do: is optimized as it does not appear on the stack... :) Best, Marcel |
In reply to this post by marcel.taeumel (old)
On Jan 16, 2015, at 8:16 AM, Marcel Taeumel <[hidden email]> wrote: > Ah, I just thought of an array as being some contiguous memory area. Mea > culpa. : That's exactly what it is. I've made a fool if myself replying to email b4 I've properly woken up :-/ > > @ Levente: Thanks for the implementation of LinkedDictionary! Now I see, > what you meant. The links are still an addition to the array of Dictionary. > I misunderstood that part. > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799945.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > |
In reply to this post by marcel.taeumel (old)
Hi Marcel,
On Jan 16, 2015, at 8:28 AM, Marcel Taeumel <[hidden email]> wrote: > Okay, the Array >> #do: comes down to Number's: > > to: stop do: aBlock > "Normally compiled in-line, and therefore not overridable. > Evaluate aBlock for each element of the interval (self to: stop by: 1)." > | nextValue | > nextValue := self. > [nextValue <= stop] > whileTrue: > [aBlock value: nextValue. > nextValue := nextValue + 1] > > And the LinkedList >> #do: is: > > do: aBlock > | aLink | > aLink := firstLink. > [aLink == nil] whileFalse: > [aBlock value: aLink. > aLink := aLink nextLink] > > So Array iteration has the incrementation of 1 in each iteration. That makes > the difference? I thought that #to:do: is optimized as it does not appear on > the stack... :) Ghhh, it's too early :-). Only just now drinking some tea. The Array implementation is compiled to an unlined while loop so all the overhead is in each application of at: that does bounds checking every time. I'll profile this with the VMProfiler in a few minutes. This is great news for Sista because this is some of the overhead it will eliminate. > > Best, > Marcel > > > > -- > View this message in context: http://forum.world.st/The-Trunk-Collections-mt-593-mcz-tp4799750p4799947.html > Sent from the Squeak - Dev mailing list archive at Nabble.com. > |
Free forum by Nabble | Edit this page |