The Trunk: Collections-ul.874.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-ul.874.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.874.mcz

==================== Summary ====================

Name: Collections-ul.874
Author: ul
Time: 17 February 2020, 12:18:14.300566 am
UUID: b9affc2a-3c70-44f8-8e6a-66b7df1a41d0
Ancestors: Collections-topa.873

HashedCollection changes:
- speed up #goodPrimeAtLeast: when its argument is small
- let #sizeFor: return the smallest possible capacity (3)
- use #sizeFor: instead of #goodPrimeAtLeast: in #compact. This fixes the case when a 4 element dictionary is compacted to 80% capacity (array size = 5), which is higher than the advertised maximum: 75%.
- fixed the formula in #sizeFor:, which made the shortcut for small values unnecessary. Removed the shortcut. The fix often yields smaller dictionaries.
- store good primes in a class variable GoodPrimes for faster access and possible customization
- use blocks instead of symbols in #compactAll and #rehashAll
- introduced #slowSize in HashedCollection, which makes it possible to not override #growSize and #compact in weak subclasses

=============== Diff against Collections-topa.873 ===============

Item was changed:
  Collection subclass: #HashedCollection
  instanceVariableNames: 'tally array'
+ classVariableNames: 'GoodPrimes'
- classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Abstract'!
 
  !HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
  I am an abstract collection of objects that implement hash and equality in a consitent way. This means that whenever two objects are equal, their hashes have to be equal too. If two objects are equal then I can only store one of them. Hashes are expected to be integers (preferably SmallIntegers). I also expect that the objects contained by me do not change their hashes. If that happens, hash invariants have to be re-established, which can be done by #rehash.
 
  Since I'm abstract, no instances of me should exist. My subclasses should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.
 
  Instance Variables
  array: <ArrayedCollection> (typically Array or WeakArray)
  tally: <Integer> (non-negative)
 
  array
  - An array whose size is a prime number, it's non-nil elements are the elements of the collection, and whose nil elements are empty slots. There is always at least one nil. In fact I try to keep my "load" at 75% or less so that hashing will work well.
 
  tally
  - The number of elements in the collection. The array size is always greater than this.
 
  Implementation details:
  I implement a hash table which uses open addressing with linear probing as the method of collision resolution. Searching for an element or a free slot for an element is done by #scanFor: which should return the index of the slot in array corresponding to it's argument. When an element is removed #fixCollisionsFrom: should rehash all elements in array between the original index of the removed element, wrapping around after the last slot until reaching an empty slot. My maximum load factor (75%) is hardcoded in #atNewIndex:put:, so it can only be changed by overriding that method. When my load factor reaches this limit I replace my array with a larger one (see #grow) ensuring that my load factor will be less than or equal to 50%. The new array is filled by #noCheckNoGrowFillFrom: which should use #scanForEmptySlotFor: instead of #scanFor: for better performance. I do not shrink.
  !

Item was changed:
  ----- Method: HashedCollection class>>compactAll (in category 'initialize-release') -----
  compactAll
  "HashedCollection compactAll"
 
+ self allSubclassesDo: [ :each | each compactAllInstances ]!
- self allSubclassesDo: #compactAllInstances!

Item was changed:
  ----- Method: HashedCollection class>>goodPrimeAtLeast: (in category 'sizing') -----
  goodPrimeAtLeast: lowerLimit
+ "Answer the smallest good prime >= lowerlimit.
+ If lowerLimit is larger than the largest known good prime, just make it odd.
+ Use linear search, and exponential search to speed up cases when lowerLimit is small (<2500 and <100000, respectively).
+ Assume that there are goodPrimes greater than 100000."
- "Answer the next good prime >= lowerlimit.
- If lowerLimit is larger than the largest known good prime,
- just make it odd."
 
+ | highIndex midIndex lowIndex prime |
+ lowerLimit <= 7 ifTrue: [
+ lowerLimit <= 3 ifTrue: [ ^3 ].
+ lowerLimit <= 5 ifTrue: [ ^5 ].
+ ^7 ].
+ GoodPrimes ifNil: [ "migration only"
+ self initializeGoodPrimes ].
+ lowerLimit < 2500 ifTrue: [
+ "Use linear search when the limit is small. The boundary is based on measurements."
+ highIndex := 4. "skip 3 5 and 7"
+ [ (GoodPrimes at: highIndex) < lowerLimit ] whileTrue: [
+ highIndex := highIndex + 1 ].
+ ^GoodPrimes at: highIndex ].
+ lowerLimit < 100000
+ ifTrue: [
+ "Use exponential search when the limit is not too large. The boundary is based on measurements."
+ highIndex := 1.
+ [ (GoodPrimes at: highIndex) < lowerLimit ] whileTrue: [
+ highIndex := highIndex * 2 ].
+ lowIndex := highIndex // 2 + 1. "highIndex // 2 was smaller than lowerLimit" ]
+ ifFalse: [
+ "Regular binary search."
+ lowIndex := 1.
+ highIndex := GoodPrimes size.
+ "Check whether the largest prime would fit"
+ (GoodPrimes at: highIndex) < lowerLimit ifTrue: [
+ ^lowerLimit bitOr: 1 ]. ].
+ [ highIndex - lowIndex <= 1 ] whileFalse: [
+ midIndex := highIndex + lowIndex // 2.
+ prime := GoodPrimes at: midIndex.
- | low mid high prime primes |
- primes := self goodPrimes.
- high := primes size.
- lowerLimit > (primes at: high) ifTrue: [
- ^lowerLimit bitOr: 1 ].
- low := 1.
- [ high - low <= 1 ] whileFalse: [
- mid := high + low // 2.
- prime := primes at: mid.
  lowerLimit < prime
+ ifTrue: [ highIndex := midIndex ]
- ifTrue: [ high := mid ]
  ifFalse: [
  lowerLimit > prime
+ ifTrue: [ lowIndex := midIndex ]
- ifTrue: [ low := mid ]
  ifFalse: [ ^prime ] ] ].
+ (GoodPrimes at: lowIndex) >= lowerLimit ifTrue: [ ^GoodPrimes at: lowIndex ].
+ ^GoodPrimes at: highIndex!
- (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
- ^primes at: high!

Item was changed:
  ----- Method: HashedCollection class>>goodPrimes (in category 'sizing') -----
  goodPrimes
+ "Answer a sorted array of prime numbers less than one billion that make good hash table sizes. See #initializeGoodPrimes."
- "Answer a sorted array of prime numbers less than one billion that make good
- hash table sizes. Should be expanded as needed.  See comments below code"
 
+ ^GoodPrimes ifNil: [
+ self initializeGoodPrimes.
+ GoodPrimes ]!
- ^#(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
- 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
- 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
- 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
- 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
- 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
- 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
- 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
- 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
- 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
- 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
- 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
- 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
- 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
- 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
- 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
- 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
- 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
- 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
- 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
- 1073741833)
-
- "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
-
- "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers. This helps to avoid the collision of chains in identity-based hashed collections. To do that  they were chosen to return a low value when the following block is evaluated with them as argument:
-  [ :prime |
- | n slots cost optimalDistance |
- n := 1 bitShift: 22.
- slots := Array new: n + 1.
- 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
- slots at: n + 1 put: prime.
- slots sort.
- cost := 0.
- optimalDistance := prime // n.
- 2 to: n + 1 do: [ :index |
- | newCost |
- newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
- newCost > cost ifTrue: [ cost := newCost ] ].
- result add: prime -> cost ]
-
- The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was added:
+ ----- Method: HashedCollection class>>initialize (in category 'sizing') -----
+ initialize
+
+ self initializeGoodPrimes!

Item was added:
+ ----- Method: HashedCollection class>>initializeGoodPrimes (in category 'sizing') -----
+ initializeGoodPrimes
+ "GoodPrimes is a sorted array of prime numbers less than one billion that make good hash table sizes. Should be expanded as needed. See comments below code."
+
+ GoodPrimes := #(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
+ 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
+ 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
+ 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
+ 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
+ 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
+ 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
+ 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
+ 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
+ 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
+ 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
+ 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
+ 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
+ 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
+ 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
+ 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
+ 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
+ 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
+ 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
+ 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
+ 1073741833)
+
+ "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
+
+ "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers. This helps to avoid the collision of chains in identity-based hashed collections. To do that  they were chosen to return a low value when the following block is evaluated with them as argument:
+  [ :prime |
+ | n slots cost optimalDistance |
+ n := 1 bitShift: 22.
+ slots := Array new: n + 1.
+ 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
+ slots at: n + 1 put: prime.
+ slots sort.
+ cost := 0.
+ optimalDistance := prime // n.
+ 2 to: n + 1 do: [ :index |
+ | newCost |
+ newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
+ newCost > cost ifTrue: [ cost := newCost ] ].
+ result add: prime -> cost ]
+
+ The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was changed:
  ----- Method: HashedCollection class>>new (in category 'instance creation') -----
  new
+ "Create a HashedCollection large enough to hold 3 different objects without growing."
+
+ ^self basicNew initialize: 5 "For performance, inline the value 5 which would normally be returned by #sizeFor:."!
- ^ self basicNew initialize: 5!

Item was changed:
  ----- Method: HashedCollection class>>new: (in category 'instance creation') -----
+ new: numberOfElements
+ "Create a HashedCollection large enough to hold numberOfElements different objects without growing."
+
+ ^self basicNew initialize: (numberOfElements <= 3
+ ifFalse: [ self sizeFor: numberOfElements ]
+ ifTrue: [ "Inline values returned by #sizeFor: to ensure that #new: is not significantly slower than #new for small values."
+ numberOfElements < 3
+ ifTrue: [ 3 ]
+ ifFalse: [ 5 ] ])!
- new: nElements
- "Create a Set large enough to hold nElements without growing"
- ^ self basicNew initialize: (self sizeFor: nElements)!

Item was changed:
  ----- Method: HashedCollection class>>rehashAll (in category 'initialize-release') -----
  rehashAll
  "HashedCollection rehashAll"
 
+ self allSubclassesDo: [ :each | each rehashAllInstances ]!
- self allSubclassesDo: #rehashAllInstances!

Item was changed:
  ----- Method: HashedCollection class>>sizeFor: (in category 'sizing') -----
+ sizeFor: numberOfElements
+ "Return a large enough prime (or odd if too large), the size of the internal array to hold numberOfElements with at most 75% load factor."
- sizeFor: nElements
- "Large enough prime (or odd if too large) size to hold nElements with some slop (see fullCheck)"
 
+ ^self goodPrimeAtLeast: numberOfElements * 4 + 2 // 3 "Optimized version of (numberOfElements * 4 / 3) ceiling."!
- nElements < 4 ifTrue: [ ^5 ].
- ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!

Item was changed:
  ----- Method: HashedCollection>>compact (in category 'private') -----
  compact
  "Reduce the size of array so that the load factor will be ~75%."
 
  | newCapacity |
+ newCapacity := self class sizeFor: self slowSize.
- newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
  self growTo: newCapacity!

Item was changed:
  ----- Method: HashedCollection>>growSize (in category 'private') -----
  growSize
  "Answer what my next higher table size should be"
 
+ ^self class sizeFor: self slowSize * 2!
- ^self class goodPrimeAtLeast: array size * 3 // 2 + 2!

Item was added:
+ ----- Method: HashedCollection>>slowSize (in category 'accessing') -----
+ slowSize
+ "Answer an upper bound of the number of elements in this collection. For regular collections, this can simply be the value of tally, but for collections that cannot maintain an exact value, like current weak collections, this has to be calculated on the fly."
+
+ ^tally!

Item was removed:
- ----- Method: WeakIdentityDictionary>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- tally := self slowSize.
- newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakKeyDictionary>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakKeyDictionary>>growSize (in category 'private') -----
- growSize
- "Answer what my next table size should be.
- Note that, it can be less than the current."
-
- ^self class goodPrimeAtLeast: self slowSize * 2 + 2!

Item was removed:
- ----- Method: WeakSet>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakSet>>growSize (in category 'private') -----
- growSize
- "Answer what my next table size should be.
- Note that, it can be less than the current."
-
- ^self class goodPrimeAtLeast: self slowSize * 2 + 2!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.874.mcz

marcel.taeumel
Hi Levente.

Thanks! :-) +1

Is this change more or less covered with tests?

Best,
Marcel

Am 17.02.2020 00:24:32 schrieb [hidden email] <[hidden email]>:

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.874.mcz

==================== Summary ====================

Name: Collections-ul.874
Author: ul
Time: 17 February 2020, 12:18:14.300566 am
UUID: b9affc2a-3c70-44f8-8e6a-66b7df1a41d0
Ancestors: Collections-topa.873

HashedCollection changes:
- speed up #goodPrimeAtLeast: when its argument is small
- let #sizeFor: return the smallest possible capacity (3)
- use #sizeFor: instead of #goodPrimeAtLeast: in #compact. This fixes the case when a 4 element dictionary is compacted to 80% capacity (array size = 5), which is higher than the advertised maximum: 75%.
- fixed the formula in #sizeFor:, which made the shortcut for small values unnecessary. Removed the shortcut. The fix often yields smaller dictionaries.
- store good primes in a class variable GoodPrimes for faster access and possible customization
- use blocks instead of symbols in #compactAll and #rehashAll
- introduced #slowSize in HashedCollection, which makes it possible to not override #growSize and #compact in weak subclasses

=============== Diff against Collections-topa.873 ===============

Item was changed:
Collection subclass: #HashedCollection
instanceVariableNames: 'tally array'
+ classVariableNames: 'GoodPrimes'
- classVariableNames: ''
poolDictionaries: ''
category: 'Collections-Abstract'!

!HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
I am an abstract collection of objects that implement hash and equality in a consitent way. This means that whenever two objects are equal, their hashes have to be equal too. If two objects are equal then I can only store one of them. Hashes are expected to be integers (preferably SmallIntegers). I also expect that the objects contained by me do not change their hashes. If that happens, hash invariants have to be re-established, which can be done by #rehash.

Since I'm abstract, no instances of me should exist. My subclasses should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.

Instance Variables
array: (typically Array or WeakArray)
tally: (non-negative)

array
- An array whose size is a prime number, it's non-nil elements are the elements of the collection, and whose nil elements are empty slots. There is always at least one nil. In fact I try to keep my "load" at 75% or less so that hashing will work well.

tally
- The number of elements in the collection. The array size is always greater than this.

Implementation details:
I implement a hash table which uses open addressing with linear probing as the method of collision resolution. Searching for an element or a free slot for an element is done by #scanFor: which should return the index of the slot in array corresponding to it's argument. When an element is removed #fixCollisionsFrom: should rehash all elements in array between the original index of the removed element, wrapping around after the last slot until reaching an empty slot. My maximum load factor (75%) is hardcoded in #atNewIndex:put:, so it can only be changed by overriding that method. When my load factor reaches this limit I replace my array with a larger one (see #grow) ensuring that my load factor will be less than or equal to 50%. The new array is filled by #noCheckNoGrowFillFrom: which should use #scanForEmptySlotFor: instead of #scanFor: for better performance. I do not shrink.
!

Item was changed:
----- Method: HashedCollection class>>compactAll (in category 'initialize-release') -----
compactAll
"HashedCollection compactAll"

+ self allSubclassesDo: [ :each | each compactAllInstances ]!
- self allSubclassesDo: #compactAllInstances!

Item was changed:
----- Method: HashedCollection class>>goodPrimeAtLeast: (in category 'sizing') -----
goodPrimeAtLeast: lowerLimit
+ "Answer the smallest good prime >= lowerlimit.
+ If lowerLimit is larger than the largest known good prime, just make it odd.
+ Use linear search, and exponential search to speed up cases when lowerLimit is small (<2500 and=""><100000,>
+ Assume that there are goodPrimes greater than 100000."
- "Answer the next good prime >= lowerlimit.
- If lowerLimit is larger than the largest known good prime,
- just make it odd."

+ | highIndex midIndex lowIndex prime |
+ lowerLimit <= 7="" iftrue:="">
+ lowerLimit <= 3="" iftrue:="" [="" ^3="">
+ lowerLimit <= 5="" iftrue:="" [="" ^5="">
+ ^7 ].
+ GoodPrimes ifNil: [ "migration only"
+ self initializeGoodPrimes ].
+ lowerLimit < 2500="" iftrue:="">
+ "Use linear search when the limit is small. The boundary is based on measurements."
+ highIndex := 4. "skip 3 5 and 7"
+ [ (GoodPrimes at: highIndex) < lowerlimit="" ]="" whiletrue:="">
+ highIndex := highIndex + 1 ].
+ ^GoodPrimes at: highIndex ].
+ lowerLimit < 100000="">
+ ifTrue: [
+ "Use exponential search when the limit is not too large. The boundary is based on measurements."
+ highIndex := 1.
+ [ (GoodPrimes at: highIndex) < lowerlimit="" ]="" whiletrue:="">
+ highIndex := highIndex * 2 ].
+ lowIndex := highIndex // 2 + 1. "highIndex // 2 was smaller than lowerLimit" ]
+ ifFalse: [
+ "Regular binary search."
+ lowIndex := 1.
+ highIndex := GoodPrimes size.
+ "Check whether the largest prime would fit"
+ (GoodPrimes at: highIndex) < lowerlimit="" iftrue:="">
+ ^lowerLimit bitOr: 1 ]. ].
+ [ highIndex - lowIndex <= 1="" ]="" whilefalse:="">
+ midIndex := highIndex + lowIndex // 2.
+ prime := GoodPrimes at: midIndex.
- | low mid high prime primes |
- primes := self goodPrimes.
- high := primes size.
- lowerLimit > (primes at: high) ifTrue: [
- ^lowerLimit bitOr: 1 ].
- low := 1.
- [ high - low <= 1="" ]="" whilefalse:="">
- mid := high + low // 2.
- prime := primes at: mid.
lowerLimit <>
+ ifTrue: [ highIndex := midIndex ]
- ifTrue: [ high := mid ]
ifFalse: [
lowerLimit > prime
+ ifTrue: [ lowIndex := midIndex ]
- ifTrue: [ low := mid ]
ifFalse: [ ^prime ] ] ].
+ (GoodPrimes at: lowIndex) >= lowerLimit ifTrue: [ ^GoodPrimes at: lowIndex ].
+ ^GoodPrimes at: highIndex!
- (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
- ^primes at: high!

Item was changed:
----- Method: HashedCollection class>>goodPrimes (in category 'sizing') -----
goodPrimes
+ "Answer a sorted array of prime numbers less than one billion that make good hash table sizes. See #initializeGoodPrimes."
- "Answer a sorted array of prime numbers less than one billion that make good
- hash table sizes. Should be expanded as needed. See comments below code"

+ ^GoodPrimes ifNil: [
+ self initializeGoodPrimes.
+ GoodPrimes ]!
- ^#(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
- 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
- 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
- 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
- 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
- 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
- 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
- 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
- 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
- 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
- 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
- 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
- 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
- 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
- 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
- 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
- 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
- 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
- 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
- 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
- 1073741833)
-
- "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<><=32 and=""><><=8. see="" knuth's="" taocp="" for="">
-
- "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each < 18="" \\="" prime="" ])="" sort="" to="" an="" equidistant="" sequence="" of="" numbers.="" this="" helps="" to="" avoid="" the="" collision="" of="" chains="" in="" identity-based="" hashed="" collections.="" to="" do="" that="" they="" were="" chosen="" to="" return="" a="" low="" value="" when="" the="" following="" block="" is="" evaluated="" with="" them="" as="">
- [ :prime |
- | n slots cost optimalDistance |
- n := 1 bitShift: 22.
- slots := Array new: n + 1.
- 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
- slots at: n + 1 put: prime.
- slots sort.
- cost := 0.
- optimalDistance := prime // n.
- 2 to: n + 1 do: [ :index |
- | newCost |
- newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
- newCost > cost ifTrue: [ cost := newCost ] ].
- result add: prime -> cost ]
-
- The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was added:
+ ----- Method: HashedCollection class>>initialize (in category 'sizing') -----
+ initialize
+
+ self initializeGoodPrimes!

Item was added:
+ ----- Method: HashedCollection class>>initializeGoodPrimes (in category 'sizing') -----
+ initializeGoodPrimes
+ "GoodPrimes is a sorted array of prime numbers less than one billion that make good hash table sizes. Should be expanded as needed. See comments below code."
+
+ GoodPrimes := #(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
+ 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
+ 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
+ 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
+ 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
+ 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
+ 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
+ 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
+ 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
+ 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
+ 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
+ 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
+ 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
+ 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
+ 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
+ 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
+ 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
+ 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
+ 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
+ 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
+ 1073741833)
+
+ "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<><=32 and=""><><=8. see="" knuth's="" taocp="" for="">
+
+ "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each < 18="" \\="" prime="" ])="" sort="" to="" an="" equidistant="" sequence="" of="" numbers.="" this="" helps="" to="" avoid="" the="" collision="" of="" chains="" in="" identity-based="" hashed="" collections.="" to="" do="" that="" they="" were="" chosen="" to="" return="" a="" low="" value="" when="" the="" following="" block="" is="" evaluated="" with="" them="" as="">
+ [ :prime |
+ | n slots cost optimalDistance |
+ n := 1 bitShift: 22.
+ slots := Array new: n + 1.
+ 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
+ slots at: n + 1 put: prime.
+ slots sort.
+ cost := 0.
+ optimalDistance := prime // n.
+ 2 to: n + 1 do: [ :index |
+ | newCost |
+ newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
+ newCost > cost ifTrue: [ cost := newCost ] ].
+ result add: prime -> cost ]
+
+ The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was changed:
----- Method: HashedCollection class>>new (in category 'instance creation') -----
new
+ "Create a HashedCollection large enough to hold 3 different objects without growing."
+
+ ^self basicNew initialize: 5 "For performance, inline the value 5 which would normally be returned by #sizeFor:."!
- ^ self basicNew initialize: 5!

Item was changed:
----- Method: HashedCollection class>>new: (in category 'instance creation') -----
+ new: numberOfElements
+ "Create a HashedCollection large enough to hold numberOfElements different objects without growing."
+
+ ^self basicNew initialize: (numberOfElements <=>
+ ifFalse: [ self sizeFor: numberOfElements ]
+ ifTrue: [ "Inline values returned by #sizeFor: to ensure that #new: is not significantly slower than #new for small values."
+ numberOfElements <>
+ ifTrue: [ 3 ]
+ ifFalse: [ 5 ] ])!
- new: nElements
- "Create a Set large enough to hold nElements without growing"
- ^ self basicNew initialize: (self sizeFor: nElements)!

Item was changed:
----- Method: HashedCollection class>>rehashAll (in category 'initialize-release') -----
rehashAll
"HashedCollection rehashAll"

+ self allSubclassesDo: [ :each | each rehashAllInstances ]!
- self allSubclassesDo: #rehashAllInstances!

Item was changed:
----- Method: HashedCollection class>>sizeFor: (in category 'sizing') -----
+ sizeFor: numberOfElements
+ "Return a large enough prime (or odd if too large), the size of the internal array to hold numberOfElements with at most 75% load factor."
- sizeFor: nElements
- "Large enough prime (or odd if too large) size to hold nElements with some slop (see fullCheck)"

+ ^self goodPrimeAtLeast: numberOfElements * 4 + 2 // 3 "Optimized version of (numberOfElements * 4 / 3) ceiling."!
- nElements < 4="" iftrue:="" [="" ^5="">
- ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!

Item was changed:
----- Method: HashedCollection>>compact (in category 'private') -----
compact
"Reduce the size of array so that the load factor will be ~75%."

| newCapacity |
+ newCapacity := self class sizeFor: self slowSize.
- newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
self growTo: newCapacity!

Item was changed:
----- Method: HashedCollection>>growSize (in category 'private') -----
growSize
"Answer what my next higher table size should be"

+ ^self class sizeFor: self slowSize * 2!
- ^self class goodPrimeAtLeast: array size * 3 // 2 + 2!

Item was added:
+ ----- Method: HashedCollection>>slowSize (in category 'accessing') -----
+ slowSize
+ "Answer an upper bound of the number of elements in this collection. For regular collections, this can simply be the value of tally, but for collections that cannot maintain an exact value, like current weak collections, this has to be calculated on the fly."
+
+ ^tally!

Item was removed:
- ----- Method: WeakIdentityDictionary>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- tally := self slowSize.
- newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakKeyDictionary>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakKeyDictionary>>growSize (in category 'private') -----
- growSize
- "Answer what my next table size should be.
- Note that, it can be less than the current."
-
- ^self class goodPrimeAtLeast: self slowSize * 2 + 2!

Item was removed:
- ----- Method: WeakSet>>compact (in category 'private') -----
- compact
- "Reduce the size of array so that the load factor will be ~75%."
-
- | newCapacity |
- newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
- self growTo: newCapacity!

Item was removed:
- ----- Method: WeakSet>>growSize (in category 'private') -----
- growSize
- "Answer what my next table size should be.
- Note that, it can be less than the current."
-
- ^self class goodPrimeAtLeast: self slowSize * 2 + 2!




Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-ul.874.mcz

Levente Uzonyi
Hi Marcel,

On Mon, 17 Feb 2020, Marcel Taeumel wrote:

> Hi Levente.
> Thanks! :-) +1
>
> Is this change more or less covered with tests?

Most changes are covered by CollectionsTests-ul.333.

Levente

>
> Best,
> Marcel
>
>       Am 17.02.2020 00:24:32 schrieb [hidden email] <[hidden email]>:
>
>       Levente Uzonyi uploaded a new version of Collections to project The Trunk:
>       http://source.squeak.org/trunk/Collections-ul.874.mcz
>
>       ==================== Summary ====================
>
>       Name: Collections-ul.874
>       Author: ul
>       Time: 17 February 2020, 12:18:14.300566 am
>       UUID: b9affc2a-3c70-44f8-8e6a-66b7df1a41d0
>       Ancestors: Collections-topa.873
>
>       HashedCollection changes:
>       - speed up #goodPrimeAtLeast: when its argument is small
>       - let #sizeFor: return the smallest possible capacity (3)
>       - use #sizeFor: instead of #goodPrimeAtLeast: in #compact. This fixes the case when a 4 element dictionary is compacted to 80% capacity (array size = 5), which is higher than the advertised maximum: 75%.
>       - fixed the formula in #sizeFor:, which made the shortcut for small values unnecessary. Removed the shortcut. The fix often yields smaller dictionaries.
>       - store good primes in a class variable GoodPrimes for faster access and possible customization
>       - use blocks instead of symbols in #compactAll and #rehashAll
>       - introduced #slowSize in HashedCollection, which makes it possible to not override #growSize and #compact in weak subclasses
>
>       =============== Diff against Collections-topa.873 ===============
>
>       Item was changed:
>       Collection subclass: #HashedCollection
>       instanceVariableNames: 'tally array'
>       + classVariableNames: 'GoodPrimes'
>       - classVariableNames: ''
>       poolDictionaries: ''
>       category: 'Collections-Abstract'!
>
>       !HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
>       I am an abstract collection of objects that implement hash and equality in a consitent way. This means that whenever two objects are equal, their hashes have to be equal too. If two objects are equal then I can
>       only store one of them. Hashes are expected to be integers (preferably SmallIntegers). I also expect that the objects contained by me do not change their hashes. If that happens, hash invariants have to be
>       re-established, which can be done by #rehash.
>
>       Since I'm abstract, no instances of me should exist. My subclasses should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.
>
>       Instance Variables
>       array: (typically Array or WeakArray)
>       tally: (non-negative)
>
>       array
>       - An array whose size is a prime number, it's non-nil elements are the elements of the collection, and whose nil elements are empty slots. There is always at least one nil. In fact I try to keep my "load" at 75%
>       or less so that hashing will work well.
>
>       tally
>       - The number of elements in the collection. The array size is always greater than this.
>
>       Implementation details:
>       I implement a hash table which uses open addressing with linear probing as the method of collision resolution. Searching for an element or a free slot for an element is done by #scanFor: which should return the
>       index of the slot in array corresponding to it's argument. When an element is removed #fixCollisionsFrom: should rehash all elements in array between the original index of the removed element, wrapping around
>       after the last slot until reaching an empty slot. My maximum load factor (75%) is hardcoded in #atNewIndex:put:, so it can only be changed by overriding that method. When my load factor reaches this limit I
>       replace my array with a larger one (see #grow) ensuring that my load factor will be less than or equal to 50%. The new array is filled by #noCheckNoGrowFillFrom: which should use #scanForEmptySlotFor: instead of
>       #scanFor: for better performance. I do not shrink.
>       !
>
>       Item was changed:
>       ----- Method: HashedCollection class>>compactAll (in category 'initialize-release') -----
>       compactAll
>       "HashedCollection compactAll"
>
>       + self allSubclassesDo: [ :each | each compactAllInstances ]!
>       - self allSubclassesDo: #compactAllInstances!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>goodPrimeAtLeast: (in category 'sizing') -----
>       goodPrimeAtLeast: lowerLimit
>       + "Answer the smallest good prime >= lowerlimit.
>       + If lowerLimit is larger than the largest known good prime, just make it odd.
>       + Use linear search, and exponential search to speed up cases when lowerLimit is small (<2500 and=""><2500><100000,><100000,>
>       + Assume that there are goodPrimes greater than 100000."
>       - "Answer the next good prime >= lowerlimit.
>       - If lowerLimit is larger than the largest known good prime,
>       - just make it odd."
>
>       + | highIndex midIndex lowIndex prime |
>       + lowerLimit <= 7="" iftrue:=""><=>
>       + lowerLimit <= 3="" iftrue:="" [="" ^3=""><=>
>       + lowerLimit <= 5="" iftrue:="" [="" ^5=""><=>
>       + ^7 ].
>       + GoodPrimes ifNil: [ "migration only"
>       + self initializeGoodPrimes ].
>       + lowerLimit < 2500="" iftrue:="">
>       + "Use linear search when the limit is small. The boundary is based on measurements."
>       + highIndex := 4. "skip 3 5 and 7"
>       + [ (GoodPrimes at: highIndex) < lowerlimit="" ]="" whiletrue:="">
>       + highIndex := highIndex + 1 ].
>       + ^GoodPrimes at: highIndex ].
>       + lowerLimit < 100000="">
>       + ifTrue: [
>       + "Use exponential search when the limit is not too large. The boundary is based on measurements."
>       + highIndex := 1.
>       + [ (GoodPrimes at: highIndex) < lowerlimit="" ]="" whiletrue:="">
>       + highIndex := highIndex * 2 ].
>       + lowIndex := highIndex // 2 + 1. "highIndex // 2 was smaller than lowerLimit" ]
>       + ifFalse: [
>       + "Regular binary search."
>       + lowIndex := 1.
>       + highIndex := GoodPrimes size.
>       + "Check whether the largest prime would fit"
>       + (GoodPrimes at: highIndex) < lowerlimit="" iftrue:="">
>       + ^lowerLimit bitOr: 1 ]. ].
>       + [ highIndex - lowIndex <= 1="" ]="" whilefalse:=""><=>
>       + midIndex := highIndex + lowIndex // 2.
>       + prime := GoodPrimes at: midIndex.
>       - | low mid high prime primes |
>       - primes := self goodPrimes.
>       - high := primes size.
>       - lowerLimit > (primes at: high) ifTrue: [
>       - ^lowerLimit bitOr: 1 ].
>       - low := 1.
>       - [ high - low <= 1="" ]="" whilefalse:=""><=>
>       - mid := high + low // 2.
>       - prime := primes at: mid.
>       lowerLimit
>       + ifTrue: [ highIndex := midIndex ]
>       - ifTrue: [ high := mid ]
>       ifFalse: [
>       lowerLimit > prime
>       + ifTrue: [ lowIndex := midIndex ]
>       - ifTrue: [ low := mid ]
>       ifFalse: [ ^prime ] ] ].
>       + (GoodPrimes at: lowIndex) >= lowerLimit ifTrue: [ ^GoodPrimes at: lowIndex ].
>       + ^GoodPrimes at: highIndex!
>       - (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
>       - ^primes at: high!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>goodPrimes (in category 'sizing') -----
>       goodPrimes
>       + "Answer a sorted array of prime numbers less than one billion that make good hash table sizes. See #initializeGoodPrimes."
>       - "Answer a sorted array of prime numbers less than one billion that make good
>       - hash table sizes. Should be expanded as needed. See comments below code"
>
>       + ^GoodPrimes ifNil: [
>       + self initializeGoodPrimes.
>       + GoodPrimes ]!
>       - ^#(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
>       - 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
>       - 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
>       - 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
>       - 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
>       - 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
>       - 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
>       - 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
>       - 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
>       - 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
>       - 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
>       - 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
>       - 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
>       - 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
>       - 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
>       - 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
>       - 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
>       - 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
>       - 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
>       - 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
>       - 1073741833)
>       -
>       - "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<=32 and=""><=32><=8. see="" knuth's=""
>       taocp="" for=""><=8.>
>       -
>       - "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each < 18="" \\="" prime="" ])="" sort="" to="" an="" equidistant="" sequence="" of="" numbers.="" this="" helps="" to="" avoid=""
>       the="" collision="" of="" chains="" in="" identity-based="" hashed="" collections.="" to="" do="" that="" they="" were="" chosen="" to="" return="" a="" low="" value="" when="" the="" following="" block="" is=""
>       evaluated="" with="" them="" as="">
>       - [ :prime |
>       - | n slots cost optimalDistance |
>       - n := 1 bitShift: 22.
>       - slots := Array new: n + 1.
>       - 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
>       - slots at: n + 1 put: prime.
>       - slots sort.
>       - cost := 0.
>       - optimalDistance := prime // n.
>       - 2 to: n + 1 do: [ :index |
>       - | newCost |
>       - newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
>       - newCost > cost ifTrue: [ cost := newCost ] ].
>       - result add: prime -> cost ]
>       -
>       - The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!
>
>       Item was added:
>       + ----- Method: HashedCollection class>>initialize (in category 'sizing') -----
>       + initialize
>       +
>       + self initializeGoodPrimes!
>
>       Item was added:
>       + ----- Method: HashedCollection class>>initializeGoodPrimes (in category 'sizing') -----
>       + initializeGoodPrimes
>       + "GoodPrimes is a sorted array of prime numbers less than one billion that make good hash table sizes. Should be expanded as needed. See comments below code."
>       +
>       + GoodPrimes := #(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
>       + 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
>       + 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
>       + 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
>       + 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
>       + 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
>       + 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
>       + 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
>       + 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
>       + 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
>       + 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
>       + 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
>       + 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
>       + 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
>       + 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
>       + 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
>       + 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
>       + 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
>       + 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
>       + 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
>       + 1073741833)
>       +
>       + "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<=32 and=""><=32><=8. see="" knuth's=""
>       taocp="" for=""><=8.>
>       +
>       + "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each < 18="" \\="" prime="" ])="" sort="" to="" an="" equidistant="" sequence="" of="" numbers.="" this="" helps="" to="" avoid=""
>       the="" collision="" of="" chains="" in="" identity-based="" hashed="" collections.="" to="" do="" that="" they="" were="" chosen="" to="" return="" a="" low="" value="" when="" the="" following="" block="" is=""
>       evaluated="" with="" them="" as="">
>       + [ :prime |
>       + | n slots cost optimalDistance |
>       + n := 1 bitShift: 22.
>       + slots := Array new: n + 1.
>       + 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
>       + slots at: n + 1 put: prime.
>       + slots sort.
>       + cost := 0.
>       + optimalDistance := prime // n.
>       + 2 to: n + 1 do: [ :index |
>       + | newCost |
>       + newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
>       + newCost > cost ifTrue: [ cost := newCost ] ].
>       + result add: prime -> cost ]
>       +
>       + The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>new (in category 'instance creation') -----
>       new
>       + "Create a HashedCollection large enough to hold 3 different objects without growing."
>       +
>       + ^self basicNew initialize: 5 "For performance, inline the value 5 which would normally be returned by #sizeFor:."!
>       - ^ self basicNew initialize: 5!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>new: (in category 'instance creation') -----
>       + new: numberOfElements
>       + "Create a HashedCollection large enough to hold numberOfElements different objects without growing."
>       +
>       + ^self basicNew initialize: (numberOfElements <=><=>
>       + ifFalse: [ self sizeFor: numberOfElements ]
>       + ifTrue: [ "Inline values returned by #sizeFor: to ensure that #new: is not significantly slower than #new for small values."
>       + numberOfElements
>       + ifTrue: [ 3 ]
>       + ifFalse: [ 5 ] ])!
>       - new: nElements
>       - "Create a Set large enough to hold nElements without growing"
>       - ^ self basicNew initialize: (self sizeFor: nElements)!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>rehashAll (in category 'initialize-release') -----
>       rehashAll
>       "HashedCollection rehashAll"
>
>       + self allSubclassesDo: [ :each | each rehashAllInstances ]!
>       - self allSubclassesDo: #rehashAllInstances!
>
>       Item was changed:
>       ----- Method: HashedCollection class>>sizeFor: (in category 'sizing') -----
>       + sizeFor: numberOfElements
>       + "Return a large enough prime (or odd if too large), the size of the internal array to hold numberOfElements with at most 75% load factor."
>       - sizeFor: nElements
>       - "Large enough prime (or odd if too large) size to hold nElements with some slop (see fullCheck)"
>
>       + ^self goodPrimeAtLeast: numberOfElements * 4 + 2 // 3 "Optimized version of (numberOfElements * 4 / 3) ceiling."!
>       - nElements < 4="" iftrue:="" [="" ^5="">
>       - ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!
>
>       Item was changed:
>       ----- Method: HashedCollection>>compact (in category 'private') -----
>       compact
>       "Reduce the size of array so that the load factor will be ~75%."
>
>       | newCapacity |
>       + newCapacity := self class sizeFor: self slowSize.
>       - newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
>       self growTo: newCapacity!
>
>       Item was changed:
>       ----- Method: HashedCollection>>growSize (in category 'private') -----
>       growSize
>       "Answer what my next higher table size should be"
>
>       + ^self class sizeFor: self slowSize * 2!
>       - ^self class goodPrimeAtLeast: array size * 3 // 2 + 2!
>
>       Item was added:
>       + ----- Method: HashedCollection>>slowSize (in category 'accessing') -----
>       + slowSize
>       + "Answer an upper bound of the number of elements in this collection. For regular collections, this can simply be the value of tally, but for collections that cannot maintain an exact value, like current weak
>       collections, this has to be calculated on the fly."
>       +
>       + ^tally!
>
>       Item was removed:
>       - ----- Method: WeakIdentityDictionary>>compact (in category 'private') -----
>       - compact
>       - "Reduce the size of array so that the load factor will be ~75%."
>       -
>       - | newCapacity |
>       - tally := self slowSize.
>       - newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
>       - self growTo: newCapacity!
>
>       Item was removed:
>       - ----- Method: WeakKeyDictionary>>compact (in category 'private') -----
>       - compact
>       - "Reduce the size of array so that the load factor will be ~75%."
>       -
>       - | newCapacity |
>       - newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
>       - self growTo: newCapacity!
>
>       Item was removed:
>       - ----- Method: WeakKeyDictionary>>growSize (in category 'private') -----
>       - growSize
>       - "Answer what my next table size should be.
>       - Note that, it can be less than the current."
>       -
>       - ^self class goodPrimeAtLeast: self slowSize * 2 + 2!
>
>       Item was removed:
>       - ----- Method: WeakSet>>compact (in category 'private') -----
>       - compact
>       - "Reduce the size of array so that the load factor will be ~75%."
>       -
>       - | newCapacity |
>       - newCapacity := self class goodPrimeAtLeast: self slowSize * 4 // 3.
>       - self growTo: newCapacity!
>
>       Item was removed:
>       - ----- Method: WeakSet>>growSize (in category 'private') -----
>       - growSize
>       - "Answer what my next table size should be.
>       - Note that, it can be less than the current."
>       -
>       - ^self class goodPrimeAtLeast: self slowSize * 2 + 2!
>
>
>
>