The Trunk: Collections-mt.599.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-mt.599.mcz

commits-2
Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.599.mcz

==================== Summary ====================

Name: Collections-mt.599
Author: mt
Time: 19 January 2015, 9:11:48.572 am
UUID: 48da0044-2d14-3048-b9f0-90a215344a36
Ancestors: Collections-ul.598

Implemented "First wins"-strategy wrt. to the order of associations. Allowed for simplifying the code.

Order array does only grow to 75% to mitigate the greediness of the Dictionary.

=============== Diff against Collections-ul.598 ===============

Item was removed:
- ----- Method: OrderedDictionary>>add: (in category 'adding') -----
- add: anAssociation
-
- | oldLastIndex oldCapacity |
- oldLastIndex := lastIndex.
- oldCapacity := self capacity.
-
- super add: anAssociation.
-
- (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [
- | index |
- "The association was already present or we grew. We need to update the order."
- index := self scanOrderFor: anAssociation key.
- index = lastIndex ifFalse: [
- lastIndex := lastIndex + 1.
- order at: lastIndex put: (order at: index).
- order at: index put: nil.
- lastIndex = order size ifTrue: [self fixEmptySlots]]].
-
- ^ anAssociation!

Item was removed:
- ----- Method: OrderedDictionary>>at:put: (in category 'accessing') -----
- at: key put: anObject
-
- | oldLastIndex oldCapacity |
- oldLastIndex := lastIndex.
- oldCapacity := self capacity.
-
- super at: key put: anObject.
-
- (lastIndex = oldLastIndex or: [self capacity > oldCapacity]) ifTrue: [
- | index |
- "The association was already present or we grew. We need to update the order."
- index := self scanOrderFor: key.
- index = lastIndex ifFalse: [
- lastIndex := lastIndex + 1.
- order at: lastIndex put: (order at: index).
- order at: index put: nil.
- lastIndex = order size ifTrue: [self fixEmptySlots]]].
-
- ^ anObject!

Item was changed:
  ----- Method: OrderedDictionary>>atNewIndex:put: (in category 'private') -----
  atNewIndex: index put: anObject
 
+ lastIndex = order size ifTrue: [
+ self fixEmptySlots].
+
  lastIndex := lastIndex + 1.
  order at: lastIndex put: anObject.
 
  super atNewIndex: index put: anObject.!

Item was changed:
  ----- Method: OrderedDictionary>>fixEmptySlots (in category 'private') -----
  fixEmptySlots
  "Remove all nil slots in the order index to avoid overflow."
 
+ lastIndex = tally ifTrue: [^ self].
  self fillOrderFrom: order.!

Item was changed:
  ----- Method: OrderedDictionary>>growTo: (in category 'private') -----
  growTo: anInteger
 
  | oldOrder |
  super growTo: anInteger.
  oldOrder := order.
+ "Grow only to 75%. See #atNewIndex:put: in HashedCollection."
+ order := self class arrayType new: (anInteger * (3/4)) ceiling.
- order := self class arrayType new: anInteger.
  self fillOrderFrom: oldOrder.!

Item was changed:
  ----- Method: OrderedDictionary>>initialize: (in category 'private') -----
  initialize: n
 
  super initialize: n.
+ order := self class arrayType new: (n * (3/4)) ceiling.
- order := self class arrayType new: n.
  lastIndex := 0.!

Item was changed:
  ----- Method: OrderedDictionary>>removeKey:ifAbsent: (in category 'removing') -----
  removeKey: key ifAbsent: aBlock
 
+ (self scanOrderFor: key) ifNotNil: [:index |
+ order at: index put: nil].
- order at: (self scanOrderFor: key) put: nil.
  ^ super removeKey: key ifAbsent: aBlock!

Item was changed:
  ----- Method: OrderedDictionary>>scanOrderFor: (in category 'private') -----
  scanOrderFor: anObject
 
  1 to: lastIndex do: [:index |
  | element |
  ((element := order at: index) notNil and: [anObject = element key])
  ifTrue: [^ index]].
 
+ ^ nil!
- lastIndex = order size
- ifTrue: [self errorNoFreeSpace]
- ifFalse: [^ order at: lastIndex + 1 "nil"].!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

marcel.taeumel (old)
At the moment, I vote against an implementation with linked associations as suggested by Levente [1] because:

- Naming reveals implementation detail (LinkedList resp. LinkedAssociation) and thus a LinkedDictionary could only be abstract like HashedCollection is
- LinkedAssociation instances keep references (#next, #previous), which may interfere with automatic garbage collection if used outside the dictionary (see getter #associations)

We could add automatic boxing/unboxing of linked elements (like Set does with SetElement) but this would nullify the performance gain of linked structures compared to arrays.

Best,
Marcel

[1] http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz
Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Levente Uzonyi-2
On Mon, 19 Jan 2015, Marcel Taeumel wrote:

> At the moment, I vote against an implementation with linked associations as
> suggested by Levente [1] because:
>
> - Naming reveals implementation detail (LinkedList resp. LinkedAssociation)
> and thus a LinkedDictionary could only be abstract like HashedCollection is

I used this name to make it loadable into an updated Trunk image.

> - LinkedAssociation instances keep references (#next, #previous), which may
> interfere with automatic garbage collection if used outside the dictionary
> (see getter #associations)
>
> We could add automatic boxing/unboxing of linked elements (like Set does
> with SetElement) but this would nullify the performance gain of linked
> structures compared to arrays.

That's not true for removal, which still takes O(n) time with the current
implementation:

| n random array |
n := 10000.
random := Random seed: 36rSQUEAK.
array := Array new: n streamContents: [ :stream |
  n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ].
{ OrderedDictionary. LinkedDictionary } collect: [ :class |
  Smalltalk garbageCollect.
  [
  | instance |
  instance := class new.
  instance addAll: array.
  array do: [ :each | instance removeKey: each key ] ] timeToRun ].
"==> #(892 10)"

The current implementation can be modified to take only O(1) time average,
by using another array, which maps the index of an association in the
array variable to the index of the association in the order variable.

Levente

>
> Best,
> Marcel
>
> [1] http://leves.web.elte.hu/squeak/LinkedDictionary-ul.1.mcz
>
>
>
> --
> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800301.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

marcel.taeumel (old)
Good Idea. But how do we decide between time vs. memory? I just reduced "order" to be 25% smaller than "array" to save some memory because array starts growing anyway then.

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

marcel.taeumel (old)
Woah, managing such a lookup index between "array" and "order" is quite complicated with the current implementation of Dictionary. :D

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Levente Uzonyi-2
I never said it was easy.
Another way to do it is to compact eagerly on removal (as it's done in
Pharo). This gives better mass removal performance, and some of the code
becomes significantly simpler. This results in further performance boost.

Removal still takes O(n) time, which means mass removal is in O(n^2) time,
but with a significantly smaller constant (~1/10), which improves the
usability of the collection quite a bit:

| n random array |
n := 10000.
random := Random seed: 36rSQUEAK.
array := Array new: n streamContents: [ :stream |
  n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ].
{ OrderedDictionary. LinkedDictionary } collect: [ :class |
  Smalltalk garbageCollect.
  [
  | instance |
  instance := class new.
  instance addAll: array.
  array do: [ :each | instance removeKey: each key ] ] timeToRun ].
"==> #(78 10)"

(If n is 100k, then the result is still pretty bad: #(7055 136).)

You can find these changes in the Inbox.

Levente

On Mon, 19 Jan 2015, Marcel Taeumel wrote:

> Woah, managing such a lookup index between "array" and "order" is quite
> complicated with the current implementation of Dictionary. :D
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Levente Uzonyi-2
In reply to this post by marcel.taeumel (old)
I never said it was easy.
Another way to do it is to compact eagerly on removal (as it's done in
Pharo). This gives better mass removal performance, and some of the code
becomes significantly simpler. This results in further performance boost.

Removal still takes O(n) time, which means mass removal is in O(n^2), but
with a significantly smaller constant (~1/10), which improves the
usability of the collection quite a bit:

| n random array |
n := 10000.
random := Random seed: 36rSQUEAK.
array := Array new: n streamContents: [ :stream |
         n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1
] ].
{ OrderedDictionary. LinkedDictionary } collect: [ :class |
     Smalltalk garbageCollect.
     [
         | instance |
         instance := class new.
         instance addAll: array.
         array do: [ :each | instance removeKey: each key ] ] timeToRun ].
"==> #(78 10)"

(If n is 100k, then the result is still pretty bad: #(7055 136).)

You can find these changes in the Inbox.

Levente

On Mon, 19 Jan 2015, Marcel Taeumel wrote:

> Woah, managing such a lookup index between "array" and "order" is quite
> complicated with the current implementation of Dictionary. :D
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Eliot Miranda-2


On Mon, Jan 19, 2015 at 11:32 AM, Levente Uzonyi <[hidden email]> wrote:
I never said it was easy.
Another way to do it is to compact eagerly on removal (as it's done in
Pharo). This gives better mass removal performance, and some of the code
becomes significantly simpler. This results in further performance boost.

Removal still takes O(n) time, which means mass removal is in O(n^2), but with a significantly smaller constant (~1/10), which improves the
usability of the collection quite a bit:

| n random array |
n := 10000.
random := Random seed: 36rSQUEAK.
array := Array new: n streamContents: [ :stream |
        n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ].
{ OrderedDictionary. LinkedDictionary } collect: [ :class |
    Smalltalk garbageCollect.
    [
        | instance |
        instance := class new.
        instance addAll: array.
        array do: [ :each | instance removeKey: each key ] ] timeToRun ].
"==> #(78 10)"

(If n is 100k, then the result is still pretty bad: #(7055 136).)

You can find these changes in the Inbox.

In Spur, self become: self copyEmpty should be cheap.  This takes about 18 usecs on my 2.2GHz i7.  So at some point the become will be cheaper for removeAll.

Levente

On Mon, 19 Jan 2015, Marcel Taeumel wrote:

Woah, managing such a lookup index between "array" and "order" is quite
complicated with the current implementation of Dictionary. :D

Best,
Marcel



--
View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html
Sent from the Squeak - Dev mailing list archive at Nabble.com.






--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Levente Uzonyi-2
On Mon, 26 Jan 2015, Eliot Miranda wrote:

>
>
> On Mon, Jan 19, 2015 at 11:32 AM, Levente Uzonyi <[hidden email]> wrote:
>       I never said it was easy.
>       Another way to do it is to compact eagerly on removal (as it's done in
>       Pharo). This gives better mass removal performance, and some of the code
>       becomes significantly simpler. This results in further performance boost.
>
>       Removal still takes O(n) time, which means mass removal is in O(n^2), but with a significantly smaller constant (~1/10), which improves the
>       usability of the collection quite a bit:
>
>       | n random array |
>       n := 10000.
>       random := Random seed: 36rSQUEAK.
>       array := Array new: n streamContents: [ :stream |
>               n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ].
>       { OrderedDictionary. LinkedDictionary } collect: [ :class |
>           Smalltalk garbageCollect.
>           [
>               | instance |
>               instance := class new.
>               instance addAll: array.
>               array do: [ :each | instance removeKey: each key ] ] timeToRun ].
>       "==> #(78 10)"
>
>       (If n is 100k, then the result is still pretty bad: #(7055 136).)
>
>       You can find these changes in the Inbox.
>
>
> In Spur, self become: self copyEmpty should be cheap.  This takes about 18 usecs on my 2.2GHz i7.  So at some point the become will be cheaper for removeAll.
This benchmark shows that it takes O(n) time to remove a single element
using random access, while it takes O(1) time with the linked implementation.

For #removeAll, if we want to go to the direction where we allocate new
arrays (instead of replacing all the elements with nil - as it is done
now), then I think assigning them to the instance variables will always be
faster than #become:.

0.018ms sounds great for #become:.

Levente

>
>       Levente
>
>       On Mon, 19 Jan 2015, Marcel Taeumel wrote:
>
>             Woah, managing such a lookup index between "array" and "order" is quite
>             complicated with the current implementation of Dictionary. :D
>
>             Best,
>             Marcel
>
>
>
>             --
>             View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html
>             Sent from the Squeak - Dev mailing list archive at Nabble.com.
>
>
>
>
>
>
> --
> best,Eliot
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-mt.599.mcz

Eliot Miranda-2


On Mon, Jan 26, 2015 at 11:12 PM, Levente Uzonyi <[hidden email]> wrote:
On Mon, 26 Jan 2015, Eliot Miranda wrote:



On Mon, Jan 19, 2015 at 11:32 AM, Levente Uzonyi <[hidden email]> wrote:
      I never said it was easy.
      Another way to do it is to compact eagerly on removal (as it's done in
      Pharo). This gives better mass removal performance, and some of the code
      becomes significantly simpler. This results in further performance boost.

      Removal still takes O(n) time, which means mass removal is in O(n^2), but with a significantly smaller constant (~1/10), which improves the
      usability of the collection quite a bit:

      | n random array |
      n := 10000.
      random := Random seed: 36rSQUEAK.
      array := Array new: n streamContents: [ :stream |
              n timesRepeat: [ stream nextPut: (random nextInt: 1073741823) -> 1 ] ].
      { OrderedDictionary. LinkedDictionary } collect: [ :class |
          Smalltalk garbageCollect.
          [
              | instance |
              instance := class new.
              instance addAll: array.
              array do: [ :each | instance removeKey: each key ] ] timeToRun ].
      "==> #(78 10)"

      (If n is 100k, then the result is still pretty bad: #(7055 136).)

      You can find these changes in the Inbox.


In Spur, self become: self copyEmpty should be cheap.  This takes about 18 usecs on my 2.2GHz i7.  So at some point the become will be cheaper for removeAll.

This benchmark shows that it takes O(n) time to remove a single element using random access, while it takes O(1) time with the linked implementation.

For #removeAll, if we want to go to the direction where we allocate new arrays (instead of replacing all the elements with nil - as it is done now), then I think assigning them to the instance variables will always be faster than #become:.

Of course, that's much better.  Simply replace the linked list with an empty one.  
 

0.018ms sounds great for #become:.

It's a key part of the design of Spur.

Levente



      Levente

      On Mon, 19 Jan 2015, Marcel Taeumel wrote:

            Woah, managing such a lookup index between "array" and "order" is quite
            complicated with the current implementation of Dictionary. :D

            Best,
            Marcel



            --
            View this message in context: http://forum.world.st/The-Trunk-Collections-mt-599-mcz-tp4800299p4800317.html
            Sent from the Squeak - Dev mailing list archive at Nabble.com.






--
best,Eliot







--
best,
Eliot