The Trunk: Collections-ul.915.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-ul.915.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.915.mcz

==================== Summary ====================

Name: Collections-ul.915
Author: ul
Time: 5 October 2020, 12:30:33.597525 am
UUID: 93341c2a-ebb2-4d2b-abee-e3f42f509836
Ancestors: Collections-eem.913

HashedCollection changes:
- make #capacity return the actual capacity of the collection instead of the size of the internal array. This change is obviously not backwards compatible.
- improve the performance of #isEmpty when tally is 0

OrderedDictionary changes:
- make it a subclass of PluggableDictionary. This lets one create e.g. an ordered identity dictionary without creating a subclass with duplicated behavior
- simplify #initialize and #growTo: now that #capacity is accurate

=============== Diff against Collections-eem.913 ===============

Item was changed:
  ----- Method: HashedCollection>>capacity (in category 'accessing') -----
  capacity
+ "Answer the current capacity of the receiver - aka the number of elements the receiver can hold without growing."
- "Answer the current capacity of the receiver."
 
+ ^ array size * 3 // 4!
- ^ array size!

Item was changed:
  ----- Method: HashedCollection>>isEmpty (in category 'testing') -----
  isEmpty
  "For non-weak collections, we can use the tally to speed up the empty check. For weak collections, we must use the traditional way because the tally is unreliable. Also see #size vs. #slowSize."
 
+ tally = 0 ifTrue: [ ^true ].
+ ^array class isWeak and: [ super isEmpty ]!
- ^ array class isWeak
- ifFalse: [ tally = 0 ]
- ifTrue: [ super isEmpty ]!

Item was changed:
  ----- Method: HashedCollection>>removeAll (in category 'removing') -----
  removeAll
  "remove all elements from this collection.
  Preserve the capacity"
 
+ self initialize: array size!
- self initialize: self capacity!

Item was changed:
+ PluggableDictionary subclass: #OrderedDictionary
- Dictionary subclass: #OrderedDictionary
  instanceVariableNames: 'order'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Sequenceable'!
 
  !OrderedDictionary commentStamp: 'mt 1/16/2015 10:42' prior: 0!
  I am an ordered dictionary. I have an additional index (called 'order') to keep track of the insertion order of my associations.
 
  The read access is not affected by the additional index.
 
  The index is updated in O(1) [time] when inserting new keys. For present keys, that insertion involves actions in O(n) to move the respective element to the end of the order.
 
  The growth operation compacts the index and takes O(n) additional time.
 
  NOTE: This is still no instance of SequenceableCollection. Having this, some protocols are missing and may require working on #associations, which is an Array and thus sequenceable.!

Item was changed:
  ----- Method: OrderedDictionary>>growTo: (in category 'private') -----
  growTo: anInteger
 
- | oldOrder |
  super growTo: anInteger.
+ order := order grownBy: self capacity - order size!
- oldOrder := order.
- "Grow only to 75%. See #atNewIndex:put: in HashedCollection."
- order := self arrayType new: anInteger + 1 * 3 // 4.
- order
- replaceFrom: 1
- to: tally
- with: oldOrder
- startingAt: 1!

Item was changed:
  ----- Method: OrderedDictionary>>initialize: (in category 'private') -----
  initialize: n
 
  super initialize: n.
+ order := self arrayType new: self capacity!
- order := self arrayType new: n + 1 * 3 // 4!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.915.mcz

Christoph Thiede

OrderedDictionary changes:

> - make it a subclass of PluggableDictionary. This lets one create e.g. an ordered identity dictionary without creating a subclass with duplicated behavior

Just a stupid question: Why do we have separate Pluggable/IdentityDictionary subclasses of Dictionary at all and did not directly implement Dictionary with a pluggable block? Is this a historic decision or an optimization thing? :-)

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von [hidden email] <[hidden email]>
Gesendet: Montag, 5. Oktober 2020 00:31:44
An: [hidden email]; [hidden email]
Betreff: [squeak-dev] The Trunk: Collections-ul.915.mcz
 
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.915.mcz

==================== Summary ====================

Name: Collections-ul.915
Author: ul
Time: 5 October 2020, 12:30:33.597525 am
UUID: 93341c2a-ebb2-4d2b-abee-e3f42f509836
Ancestors: Collections-eem.913

HashedCollection changes:
- make #capacity return the actual capacity of the collection instead of the size of the internal array. This change is obviously not backwards compatible.
- improve the performance of #isEmpty when tally is 0

OrderedDictionary changes:
- make it a subclass of PluggableDictionary. This lets one create e.g. an ordered identity dictionary without creating a subclass with duplicated behavior
- simplify #initialize and #growTo: now that #capacity is accurate

=============== Diff against Collections-eem.913 ===============

Item was changed:
  ----- Method: HashedCollection>>capacity (in category 'accessing') -----
  capacity
+        "Answer the current capacity of the receiver - aka the number of elements the receiver can hold without growing."
-        "Answer the current capacity of the receiver."
 
+        ^ array size * 3 // 4!
-        ^ array size!

Item was changed:
  ----- Method: HashedCollection>>isEmpty (in category 'testing') -----
  isEmpty
         "For non-weak collections, we can use the tally to speed up the empty check. For weak collections, we must use the traditional way because the tally is unreliable. Also see #size vs. #slowSize."
 
+        tally = 0 ifTrue: [ ^true ].
+        ^array class isWeak and: [ super isEmpty ]!
-        ^ array class isWeak
-                ifFalse: [ tally = 0 ]
-                ifTrue: [ super isEmpty ]!

Item was changed:
  ----- Method: HashedCollection>>removeAll (in category 'removing') -----
  removeAll
         "remove all elements from this collection.
         Preserve the capacity"
        
+        self initialize: array size!
-        self initialize: self capacity!

Item was changed:
+ PluggableDictionary subclass: #OrderedDictionary
- Dictionary subclass: #OrderedDictionary
         instanceVariableNames: 'order'
         classVariableNames: ''
         poolDictionaries: ''
         category: 'Collections-Sequenceable'!
 
  !OrderedDictionary commentStamp: 'mt 1/16/2015 10:42' prior: 0!
  I am an ordered dictionary. I have an additional index (called 'order') to keep track of the insertion order of my associations.
 
  The read access is not affected by the additional index.
 
  The index is updated in O(1) [time] when inserting new keys. For present keys, that insertion involves actions in O(n) to move the respective element to the end of the order.
 
  The growth operation compacts the index and takes O(n) additional time.
 
  NOTE: This is still no instance of SequenceableCollection. Having this, some protocols are missing and may require working on #associations, which is an Array and thus sequenceable.!

Item was changed:
  ----- Method: OrderedDictionary>>growTo: (in category 'private') -----
  growTo: anInteger
 
-        | oldOrder |
         super growTo: anInteger.
+        order := order grownBy: self capacity - order size!
-        oldOrder := order.
-        "Grow only to 75%. See #atNewIndex:put: in HashedCollection."
-        order := self arrayType new: anInteger + 1 * 3 // 4.
-        order
-                replaceFrom: 1
-                to: tally
-                with: oldOrder
-                startingAt: 1!

Item was changed:
  ----- Method: OrderedDictionary>>initialize: (in category 'private') -----
  initialize: n
 
         super initialize: n.
+        order := self arrayType new: self capacity!
-        order := self arrayType new: n + 1 * 3 // 4!




Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.915.mcz

Levente Uzonyi
Hi Christoph,

On Tue, 6 Oct 2020, Thiede, Christoph wrote:

>
> > OrderedDictionary changes:
>
> > - make it a subclass of PluggableDictionary. This lets one create e.g. an ordered identity dictionary without creating a subclass with duplicated behavior
>
> Just a stupid question: Why do we have separate Pluggable/IdentityDictionary subclasses of Dictionary at all and did not directly implement Dictionary with a pluggable block? Is this a historic decision or an optimization
> thing? :-)

AFAIK PluggableDictionary was not part of Smalltalk-80, but Dictionary and
IdentityDictionary were.

Dictionary and IdentityDictionary are slightly faster than their
PluggableDictionary equivalents.

In theory, we could simplify the HashedCollection hierarchy by making
Set and Dictionary pluggable. But in practice, too many things use
IdentitySet and IdentityDictionary directly. Perhaps we could replace
those classes with stubs having the right "constructor" but we couldn't
remove those classes.
From the users' point of view, it's probably better that we have
separate classes.
(And there are all sorts of interesting questsions raising from making Set
and Dictionary pluggable. E.g. how would that affect other subclasses of
Set/Dictionary?


Levente

>
> Best,
> Christoph
>
> _________________________________________________________________________________________________________________________________________________________________________________________________________________________________
> Von: Squeak-dev <[hidden email]> im Auftrag von [hidden email] <[hidden email]>
> Gesendet: Montag, 5. Oktober 2020 00:31:44
> An: [hidden email]; [hidden email]
> Betreff: [squeak-dev] The Trunk: Collections-ul.915.mcz  
> Levente Uzonyi uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-ul.915.mcz
>
> ==================== Summary ====================
>
> Name: Collections-ul.915
> Author: ul
> Time: 5 October 2020, 12:30:33.597525 am
> UUID: 93341c2a-ebb2-4d2b-abee-e3f42f509836
> Ancestors: Collections-eem.913
>
> HashedCollection changes:
> - make #capacity return the actual capacity of the collection instead of the size of the internal array. This change is obviously not backwards compatible.
> - improve the performance of #isEmpty when tally is 0
>
> OrderedDictionary changes:
> - make it a subclass of PluggableDictionary. This lets one create e.g. an ordered identity dictionary without creating a subclass with duplicated behavior
> - simplify #initialize and #growTo: now that #capacity is accurate
>
> =============== Diff against Collections-eem.913 ===============
>
> Item was changed:
>   ----- Method: HashedCollection>>capacity (in category 'accessing') -----
>   capacity
> +        "Answer the current capacity of the receiver - aka the number of elements the receiver can hold without growing."
> -        "Answer the current capacity of the receiver."
>  
> +        ^ array size * 3 // 4!
> -        ^ array size!
>
> Item was changed:
>   ----- Method: HashedCollection>>isEmpty (in category 'testing') -----
>   isEmpty
>          "For non-weak collections, we can use the tally to speed up the empty check. For weak collections, we must use the traditional way because the tally is unreliable. Also see #size vs. #slowSize."
>  
> +        tally = 0 ifTrue: [ ^true ].
> +        ^array class isWeak and: [ super isEmpty ]!
> -        ^ array class isWeak
> -                ifFalse: [ tally = 0 ]
> -                ifTrue: [ super isEmpty ]!
>
> Item was changed:
>   ----- Method: HashedCollection>>removeAll (in category 'removing') -----
>   removeAll
>          "remove all elements from this collection.
>          Preserve the capacity"
>         
> +        self initialize: array size!
> -        self initialize: self capacity!
>
> Item was changed:
> + PluggableDictionary subclass: #OrderedDictionary
> - Dictionary subclass: #OrderedDictionary
>          instanceVariableNames: 'order'
>          classVariableNames: ''
>          poolDictionaries: ''
>          category: 'Collections-Sequenceable'!
>  
>   !OrderedDictionary commentStamp: 'mt 1/16/2015 10:42' prior: 0!
>   I am an ordered dictionary. I have an additional index (called 'order') to keep track of the insertion order of my associations.
>  
>   The read access is not affected by the additional index.
>  
>   The index is updated in O(1) [time] when inserting new keys. For present keys, that insertion involves actions in O(n) to move the respective element to the end of the order.
>  
>   The growth operation compacts the index and takes O(n) additional time.
>  
>   NOTE: This is still no instance of SequenceableCollection. Having this, some protocols are missing and may require working on #associations, which is an Array and thus sequenceable.!
>
> Item was changed:
>   ----- Method: OrderedDictionary>>growTo: (in category 'private') -----
>   growTo: anInteger
>  
> -        | oldOrder |
>          super growTo: anInteger.
> +        order := order grownBy: self capacity - order size!
> -        oldOrder := order.
> -        "Grow only to 75%. See #atNewIndex:put: in HashedCollection."
> -        order := self arrayType new: anInteger + 1 * 3 // 4.
> -        order
> -                replaceFrom: 1
> -                to: tally
> -                with: oldOrder
> -                startingAt: 1!
>
> Item was changed:
>   ----- Method: OrderedDictionary>>initialize: (in category 'private') -----
>   initialize: n
>  
>          super initialize: n.
> +        order := self arrayType new: self capacity!
> -        order := self arrayType new: n + 1 * 3 // 4!
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.915.mcz

timrowledge


> On 2020-10-06, at 5:17 AM, Levente Uzonyi <[hidden email]> wrote:
>
>
> AFAIK PluggableDictionary was not part of Smalltalk-80, but Dictionary and IdentityDictionary were.

Correct.

>
> Dictionary and IdentityDictionary are slightly faster than their PluggableDictionary equivalents.

Yah; the #scanFor.... methods will be slightly slower because of the test for the relevant block. However, for anything much more complicated than a SmallInteger #= test I suspect that the actual total time will be very close. After all the object being added has to evaluate its #= against whatever is already in the slot and comparing two Wooblies with 100 instvars (and instvar related sub-tests) to try will likely swamp the #scanFor... loop.

I don't think I've had any reason to use a PluggableDictionary before but I can see that having a way to use an application specific equivalence check instead of relying on a system-wide and inevitably limiting definition has its uses. I can imagine cases where one might need two dictionaries of nominally the same contents but where the equivalence test for inclusion needs to be different in order to make comparisons of some interesting state.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: DC: Divide and Conquer