The Inbox: Collections-ul.871.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-ul.871.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-ul.871.mcz

==================== Summary ====================

Name: Collections-ul.871
Author: ul
Time: 21 January 2020, 2:57:28.300065 am
UUID: 8f6d668d-19ff-434b-827c-345862901105
Ancestors: Collections-nice.870

HashedCollection changes:
- speed up #goodPrimeAtLeast: when its argument is small
- let #sizeFor: return the smallest possible capacity (3)
- use #sizeFor: instead of #goodPrimeAtLeast: in #compact. This fixes the case when a 4 element dictionary is compacted to 80% capacity (array size = 5), which is higher than the advertised maximum: 75%.
- fixed the formula in #sizeFor:, which made the shortcut for small values unnecessary. Removed the shortcut. The fix often yields smaller dictionaries.
- store good primes in a class variable GoodPrimes for faster access and possible customization
- use blocks instead of symbols in #compactAll and #rehashAll

=============== Diff against Collections-nice.870 ===============

Item was changed:
  Collection subclass: #HashedCollection
  instanceVariableNames: 'tally array'
+ classVariableNames: 'GoodPrimes'
- classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Abstract'!
 
  !HashedCollection commentStamp: 'ul 4/12/2010 22:37' prior: 0!
  I am an abstract collection of objects that implement hash and equality in a consitent way. This means that whenever two objects are equal, their hashes have to be equal too. If two objects are equal then I can only store one of them. Hashes are expected to be integers (preferably SmallIntegers). I also expect that the objects contained by me do not change their hashes. If that happens, hash invariants have to be re-established, which can be done by #rehash.
 
  Since I'm abstract, no instances of me should exist. My subclasses should implement #scanFor:, #fixCollisionsFrom: and #noCheckNoGrowFillFrom:.
 
  Instance Variables
  array: <ArrayedCollection> (typically Array or WeakArray)
  tally: <Integer> (non-negative)
 
  array
  - An array whose size is a prime number, it's non-nil elements are the elements of the collection, and whose nil elements are empty slots. There is always at least one nil. In fact I try to keep my "load" at 75% or less so that hashing will work well.
 
  tally
  - The number of elements in the collection. The array size is always greater than this.
 
  Implementation details:
  I implement a hash table which uses open addressing with linear probing as the method of collision resolution. Searching for an element or a free slot for an element is done by #scanFor: which should return the index of the slot in array corresponding to it's argument. When an element is removed #fixCollisionsFrom: should rehash all elements in array between the original index of the removed element, wrapping around after the last slot until reaching an empty slot. My maximum load factor (75%) is hardcoded in #atNewIndex:put:, so it can only be changed by overriding that method. When my load factor reaches this limit I replace my array with a larger one (see #grow) ensuring that my load factor will be less than or equal to 50%. The new array is filled by #noCheckNoGrowFillFrom: which should use #scanForEmptySlotFor: instead of #scanFor: for better performance. I do not shrink.
  !

Item was changed:
  ----- Method: HashedCollection class>>compactAll (in category 'initialize-release') -----
  compactAll
  "HashedCollection compactAll"
 
+ self allSubclassesDo: [ :each | each compactAllInstances ]!
- self allSubclassesDo: #compactAllInstances!

Item was changed:
  ----- Method: HashedCollection class>>goodPrimeAtLeast: (in category 'sizing') -----
  goodPrimeAtLeast: lowerLimit
+ "Answer the smallest good prime >= lowerlimit.
+ If lowerLimit is larger than the largest known good prime, just make it odd.
+ Use linear search, and exponential search to speed up cases when lowerLimit is small (<2500 and <100000, respectively).
+ Assume that there are goodPrimes greater than 100000."
- "Answer the next good prime >= lowerlimit.
- If lowerLimit is larger than the largest known good prime,
- just make it odd."
 
+ | highIndex midIndex lowIndex prime |
+ lowerLimit <= 7 ifTrue: [
+ lowerLimit <= 3 ifTrue: [ ^3 ].
+ lowerLimit <= 5 ifTrue: [ ^5 ].
+ ^7 ].
+ GoodPrimes ifNil: [ "migration only"
+ self initializeGoodPrimes ].
+ lowerLimit < 2500 ifTrue: [
+ "Use linear search when the limit is small. The boundary is based on measurements."
+ highIndex := 4. "skip 3 5 and 7"
+ [ (GoodPrimes at: highIndex) < lowerLimit ] whileTrue: [
+ highIndex := highIndex + 1 ].
+ ^GoodPrimes at: highIndex ].
+ lowerLimit < 100000
+ ifTrue: [
+ "Use exponential search when the limit is not too large. The boundary is based on measurements."
+ highIndex := 1.
+ [ (GoodPrimes at: highIndex) < lowerLimit ] whileTrue: [
+ highIndex := highIndex * 2 ].
+ lowIndex := highIndex // 2 + 1. "highIndex // 2 was smaller than lowerLimit" ]
+ ifFalse: [
+ "Regular binary search."
+ lowIndex := 1.
+ highIndex := GoodPrimes size.
+ "Check whether the largest prime would fit"
+ (GoodPrimes at: highIndex) < lowerLimit ifTrue: [
+ ^lowerLimit bitOr: 1 ]. ].
+ [ highIndex - lowIndex <= 1 ] whileFalse: [
+ midIndex := highIndex + lowIndex // 2.
+ prime := GoodPrimes at: midIndex.
- | low mid high prime primes |
- primes := self goodPrimes.
- high := primes size.
- lowerLimit > (primes at: high) ifTrue: [
- ^lowerLimit bitOr: 1 ].
- low := 1.
- [ high - low <= 1 ] whileFalse: [
- mid := high + low // 2.
- prime := primes at: mid.
  lowerLimit < prime
+ ifTrue: [ highIndex := midIndex ]
- ifTrue: [ high := mid ]
  ifFalse: [
  lowerLimit > prime
+ ifTrue: [ lowIndex := midIndex ]
- ifTrue: [ low := mid ]
  ifFalse: [ ^prime ] ] ].
+ (GoodPrimes at: lowIndex) >= lowerLimit ifTrue: [ ^GoodPrimes at: lowIndex ].
+ ^GoodPrimes at: highIndex!
- (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
- ^primes at: high!

Item was changed:
  ----- Method: HashedCollection class>>goodPrimes (in category 'sizing') -----
  goodPrimes
+ "Answer a sorted array of prime numbers less than one billion that make good hash table sizes. See #initializeGoodPrimes."
- "Answer a sorted array of prime numbers less than one billion that make good
- hash table sizes. Should be expanded as needed.  See comments below code"
 
+ ^GoodPrimes ifNil: [
+ self initializeGoodPrimes.
+ GoodPrimes ]!
- ^#(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
- 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
- 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
- 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
- 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
- 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
- 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
- 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
- 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
- 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
- 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
- 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
- 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
- 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
- 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
- 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
- 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
- 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
- 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
- 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
- 1073741833)
-
- "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
-
- "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers. This helps to avoid the collision of chains in identity-based hashed collections. To do that  they were chosen to return a low value when the following block is evaluated with them as argument:
-  [ :prime |
- | n slots cost optimalDistance |
- n := 1 bitShift: 22.
- slots := Array new: n + 1.
- 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
- slots at: n + 1 put: prime.
- slots sort.
- cost := 0.
- optimalDistance := prime // n.
- 2 to: n + 1 do: [ :index |
- | newCost |
- newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
- newCost > cost ifTrue: [ cost := newCost ] ].
- result add: prime -> cost ]
-
- The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was added:
+ ----- Method: HashedCollection class>>initialize (in category 'sizing') -----
+ initialize
+
+ self initializeGoodPrimes!

Item was added:
+ ----- Method: HashedCollection class>>initializeGoodPrimes (in category 'sizing') -----
+ initializeGoodPrimes
+ "GoodPrimes is a sorted array of prime numbers less than one billion that make good hash table sizes. Should be expanded as needed. See comments below code."
+
+ GoodPrimes := #(3 5 7 11 13 17 23 31 43 59 79 107 149 199 269 359 479 641 857 1151 1549 2069
+ 2237 2423 2617 2797 2999 3167 3359 3539 3727 3911
+ 4441 4787 5119 5471 5801 6143 6521 6827 7177 7517 7853
+ 8783 9601 10243 10867 11549 12239 12919 13679 14293 15013 15731
+ 17569 19051 20443 21767 23159 24611 25847 27397 28571 30047 31397
+ 35771 38201 40841 43973 46633 48989 51631 54371 57349 60139 62969
+ 70589 76091 80347 85843 90697 95791 101051 106261 111143 115777 120691 126311
+ 140863 150523 160969 170557 181243 190717 201653 211891 221251 232591 242873 251443
+ 282089 300869 321949 341227 362353 383681 401411 422927 443231 464951 482033 504011
+ 562621 605779 647659 681607 723623 763307 808261 844709 886163 926623 967229 1014617
+ 1121987 1201469 1268789 1345651 1429531 1492177 1577839 1651547 1722601 1800377 1878623 1942141 2028401
+ 2242727 2399581 2559173 2686813 2836357 3005579 3144971 3283993 3460133 3582923 3757093 3903769 4061261
+ 4455361 4783837 5068529 5418079 5680243 6000023 6292981 6611497 6884641 7211599 7514189 7798313 8077189
+ 9031853 9612721 10226107 10745291 11338417 11939203 12567671 13212697 13816333 14337529 14938571 15595673 16147291
+ 17851577 18993941 20180239 21228533 22375079 23450491 24635579 25683871 26850101 27921689 29090911 30153841 31292507 32467307
+ 35817611 37983761 40234253 42457253 44750177 46957969 49175831 51442639 53726417 55954637 58126987 60365939 62666977 64826669
+ 71582779 76039231 80534381 84995153 89500331 93956777 98470819 102879613 107400389 111856841 116365721 120819287 125246581 129732203
+ 143163379 152076289 161031319 169981667 179000669 187913573 196826447 205826729 214748357 223713691 232679021 241591901 250504801 259470131
+ 285162679 301939921 318717121 335494331 352271573 369148753 385926017 402603193 419480419 436157621 453034849 469712051 486589307 503366497 520043707
+ 570475349 603929813 637584271 671138659 704693081 738247541 771801929 805356457 838910803 872365267 905919671 939574117 973128521 1006682977 1040137411
+ 1073741833)
+
+ "The above primes past 2069 were chosen carefully so that they do not interact badly with 1664525 (used by hashMultiply), and so that gcd(p, (256^k) +/- a) = 1, for 0<a<=32 and 0<k<=8.  See Knuth's TAOCP for details."
+
+ "The above primes also try to map the values of ((0 to: 4095) collect: [ :each | each << 18 \\ prime ]) sort to an equidistant sequence of numbers. This helps to avoid the collision of chains in identity-based hashed collections. To do that  they were chosen to return a low value when the following block is evaluated with them as argument:
+  [ :prime |
+ | n slots cost optimalDistance |
+ n := 1 bitShift: 22.
+ slots := Array new: n + 1.
+ 0 to: n - 1 do: [ :ea | slots at: ea + 1 put: (ea bitShift: 8) \\ prime ].
+ slots at: n + 1 put: prime.
+ slots sort.
+ cost := 0.
+ optimalDistance := prime // n.
+ 2 to: n + 1 do: [ :index |
+ | newCost |
+ newCost := optimalDistance - ((slots at: index) - (slots at: index - 1)).
+ newCost > cost ifTrue: [ cost := newCost ] ].
+ result add: prime -> cost ]
+
+ The shifts in the block relate to the numer of bits the #identityHash consists of (22) and the number of bits #scaledIdentityHash shifts it (8)"!

Item was changed:
  ----- Method: HashedCollection class>>rehashAll (in category 'initialize-release') -----
  rehashAll
  "HashedCollection rehashAll"
 
+ self allSubclassesDo: [ :each | each rehashAllInstances ]!
- self allSubclassesDo: #rehashAllInstances!

Item was changed:
  ----- Method: HashedCollection class>>sizeFor: (in category 'sizing') -----
+ sizeFor: numberOfElements
+ "Large enough prime (or odd if too large) size to hold numberOfElements with some slop."
- sizeFor: nElements
- "Large enough prime (or odd if too large) size to hold nElements with some slop (see fullCheck)"
 
+ ^self goodPrimeAtLeast: numberOfElements * 4 + 2 // 3 "Optimized version of (numberOfElements * 4 / 3) ceiling."!
- nElements < 4 ifTrue: [ ^5 ].
- ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Chris Muller-3
Hi Levente,
 
+       lowerLimit <= 7 ifTrue: [
+               lowerLimit <= 3 ifTrue: [ ^3 ].
+               lowerLimit <= 5 ifTrue: [ ^5 ].
+               ^7 ].

That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.

    {
   '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'

    "performance pain?"
    '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
    '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
    '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'

    "into #sizeFor:"
    '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'

    "or code pain?"
    '[ sz > 3
            ifTrue: [ Dictionary new: sz ]
            ifFalse: [ Dictionary new ] ]'->'97% of baseline rate, 26,900,000 per second. 37.2 nanoseconds per run. 11.09778 % GC time.'}

For the most part, they won't even know about this anomaly (and shouldn't have to).  No one assumes such a heavy penalty for #new: over #new when the size is <= the default size.

I like every other aspect of your improvements here!  But may we go with the Collections-cmm.873 variation please?  It fixes that issue, check it out!

    {'[ Dictionary new ]'->'100% of baseline rate, 29,100,000 per second. 34.4 nanoseconds per run. 5.9988 % GC time.'
    '[ Dictionary new: 1 ]'->'103% of baseline rate, 29,900,000 per second. 33.4 nanoseconds per run. 4.9 % GC time.'
    '[ Dictionary new: 2 ]'->'103% of baseline rate, 30,100,000 per second. 33.2 nanoseconds per run. 4.88 % GC time.'
    '[ Dictionary new: 3 ]'->'95% of baseline rate, 27,600,000 per second. 36.3 nanoseconds per run. 5.62 % GC time.'

Best,
  Chris


 
 


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Levente Uzonyi
Hi Chris,

On Tue, 21 Jan 2020, Chris Muller wrote:

> Hi Levente,
>  
>       +       lowerLimit <= 7 ifTrue: [
>       +               lowerLimit <= 3 ifTrue: [ ^3 ].
>       +               lowerLimit <= 5 ifTrue: [ ^5 ].
>       +               ^7 ].
>
>
> That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>
>     {
>    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>
>     "performance pain?"
>     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
Even if there's a performance overhead, you use less memory.

>
>     "into #sizeFor:"
>     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'

Starting from 4, you also save time by avoiding growing, which is
more significant than what you "lose" during instance creation.

>
>     "or code pain?"
>     '[ sz > 3
>             ifTrue: [ Dictionary new: sz ]
>             ifFalse: [ Dictionary new ] ]'->'97% of baseline rate, 26,900,000 per second. 37.2 nanoseconds per run. 11.09778 % GC time.'}
>
> For the most part, they won't even know about this anomaly (and shouldn't have to).  No one assumes such a heavy penalty for #new: over #new when the size is <= the default size.

We could get rid of the anomaly by changing #new to ^self new: 3.

>
> I like every other aspect of your improvements here!  But may we go with the Collections-cmm.873 variation please?  It fixes that issue, check it out!
>
>     {'[ Dictionary new ]'->'100% of baseline rate, 29,100,000 per second. 34.4 nanoseconds per run. 5.9988 % GC time.'
>     '[ Dictionary new: 1 ]'->'103% of baseline rate, 29,900,000 per second. 33.4 nanoseconds per run. 4.9 % GC time.'
>     '[ Dictionary new: 2 ]'->'103% of baseline rate, 30,100,000 per second. 33.2 nanoseconds per run. 4.88 % GC time.'
>     '[ Dictionary new: 3 ]'->'95% of baseline rate, 27,600,000 per second. 36.3 nanoseconds per run. 5.62 % GC time.'

If we decide to keep #new as it is, then I'm not against using a similar
optimization scheme in #new:. But there should be some tests to verify
that the methods always return valid dictionaries.
And, I'd also prefer to swap the branches in #new: so that < 4 is the
first check and < 3 is the second. There should be a comment as well
about the optimization.


Levente

>
> Best,
>   Chris
>
>
>  
>        
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Chris Muller-4
Hi,

> That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>
>     {
>    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>
>     "performance pain?"
>     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'

Even if there's a performance overhead, you use less memory.

But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing less-performant code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.
 
 >     "into #sizeFor:"
>     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'

Starting from 4, you also save time by avoiding growing, which is
more significant than what you "lose" during instance creation.

Except my Dictionary is never going to grow.

In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them all, and will never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.

We could get rid of the anomaly by changing #new to ^self new: 3.

Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.

If we decide to keep #new as it is, then I'm not against using a similar
optimization scheme in #new:. But there should be some tests to verify
that the methods always return valid dictionaries.
And, I'd also prefer to swap the branches in #new: so that < 4 is the
first check and < 3 is the second. There should be a comment as well
about the optimization.

Okay, sure thing.  I just wanted to get your initial feedback before embarking on that much work.

But, personally, I think default size of 3 sounds like a fine way to go.  I don't think it'd affect places currently using "Dictionary new" very much, and anywhere it did could be easily fixed...

 - Chris



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Levente Uzonyi
Hi Chris,

On Thu, 23 Jan 2020, Chris Muller wrote:

> Hi,
>
>       > That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>       >
>       >     {
>       >    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>       >
>       >     "performance pain?"
>       >     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>       >     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>       >     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
>
>       Even if there's a performance overhead, you use less memory.
>
>
> But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing less-performant
> code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.
It was okay for quite a long time. :)

>  
>        >     "into #sizeFor:"
>
>       >     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'
>
>       Starting from 4, you also save time by avoiding growing, which is
>       more significant than what you "lose" during instance creation.
>
>
> Except my Dictionary is never going to grow.
So, we actually say the same thing.

>
> In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them all, and will
> never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.
>
>       We could get rid of the anomaly by changing #new to ^self new: 3.
>
>
> Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.

I don't see any broken contract here. It may be surprising to see
somewhat worse performance with #new: than with #new (25 nanoseconds per
instance according to your measurements), but both of those methods do
what they should. Just because there was an easy optimization applied to
#new, nothing was broken IMHO.

>
>       If we decide to keep #new as it is, then I'm not against using a similar
>       optimization scheme in #new:. But there should be some tests to verify
>       that the methods always return valid dictionaries.
>       And, I'd also prefer to swap the branches in #new: so that < 4 is the
>       first check and < 3 is the second. There should be a comment as well
>       about the optimization.
>
>
> Okay, sure thing.  I just wanted to get your initial feedback before embarking on that much work.
>
> But, personally, I think default size of 3 sounds like a fine way to go.  I don't think it'd affect places currently using "Dictionary new" very much, and anywhere it did could be easily fixed...
I didn't mean to use 3 as default capacity. I meant that in #new: only a
single optimization branch should be before #sizeFor: is applied. So,
instead of

  (numberOfElements < 3
  ifTrue: [ 3 ]
  ifFalse:
  [ numberOfElements < 4
  ifTrue: [ 5 ]
  ifFalse: [ self sizeFor: numberOfElements ] ])

the code should be

  (numberOfElements <= 3
  ifFalse: [ self sizeFor: numberOfElements ]
  ifTrue:
  [ numberOfElements < 3
  ifTrue: [ 3 ]
  ifFalse: [ 5 ] ])


Levente

>
>  - Chris
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Chris Muller-4
Hi Levente,

>       > That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>       >
>       >     {
>       >    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>       >
>       >     "performance pain?"
>       >     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>       >     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>       >     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
>
>       Even if there's a performance overhead, you use less memory.
>
>
> But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing less-performant
> code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.

It was okay for quite a long time. :)

... until this proposal which changed the smallest internal array size from 5 to 3 and introduced the above anomaly.  I assume you made that change because you felt something else wasn't okay (to which I agree!).
 

>  
>        >     "into #sizeFor:"
>
>       >     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'
>
>       Starting from 4, you also save time by avoiding growing, which is
>       more significant than what you "lose" during instance creation.
>
>
> Except my Dictionary is never going to grow.

So, we actually say the same thing.

Did we?  You were arguing about saving time with growing, which I can never gain.  I only lose during instance creation...
 
> In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them all, and will
> never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.
>
>       We could get rid of the anomaly by changing #new to ^self new: 3.
>
>
> Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.

I don't see any broken contract here. It may be surprising to see
somewhat worse performance with #new: than with #new (25 nanoseconds per
instance according to your measurements),

a 40% hit...
 
but both of those methods do
what they should. Just because there was an easy optimization applied to
#new, nothing was broken IMHO.

... for using #new: that's supposed to be for optimization!

So if you were in my shoes (which, if this goes to trunk as-is eventually you will be!), would you take a 40% hit in performance-critical code needing to instantiate a Dictionary with:

    Dictionary new: runtimeSize

Or, actually violate encapsulation to preserve performance?

   sz>3 ifTrue: [Dictionary new: sz] ifFalse: [Dictionary new]

?

My past observations have been that you like and appreciate the most-efficient performing code, so I'm truly curious!

Best Regards.
  Chris


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Nicolas Cellier
Hi Chris,
For such small size, did u try using SmallDictionary? (see in RB)

It avoids hash and just iterate thru keys with = until hit or exhausting the tally.

It also avoid creating Association, maintaining keys and values in 2 arrays.

It thus has no extra room requirement.
It seems a perfect fit for your requirements.

Le sam. 25 janv. 2020 à 01:47, Chris Muller <[hidden email]> a écrit :
Hi Levente,

>       > That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>       >
>       >     {
>       >    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>       >
>       >     "performance pain?"
>       >     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>       >     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>       >     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
>
>       Even if there's a performance overhead, you use less memory.
>
>
> But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing less-performant
> code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.

It was okay for quite a long time. :)

... until this proposal which changed the smallest internal array size from 5 to 3 and introduced the above anomaly.  I assume you made that change because you felt something else wasn't okay (to which I agree!).
 

>  
>        >     "into #sizeFor:"
>
>       >     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'
>
>       Starting from 4, you also save time by avoiding growing, which is
>       more significant than what you "lose" during instance creation.
>
>
> Except my Dictionary is never going to grow.

So, we actually say the same thing.

Did we?  You were arguing about saving time with growing, which I can never gain.  I only lose during instance creation...
 
> In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them all, and will
> never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.
>
>       We could get rid of the anomaly by changing #new to ^self new: 3.
>
>
> Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.

I don't see any broken contract here. It may be surprising to see
somewhat worse performance with #new: than with #new (25 nanoseconds per
instance according to your measurements),

a 40% hit...
 
but both of those methods do
what they should. Just because there was an easy optimization applied to
#new, nothing was broken IMHO.

... for using #new: that's supposed to be for optimization!

So if you were in my shoes (which, if this goes to trunk as-is eventually you will be!), would you take a 40% hit in performance-critical code needing to instantiate a Dictionary with:

    Dictionary new: runtimeSize

Or, actually violate encapsulation to preserve performance?

   sz>3 ifTrue: [Dictionary new: sz] ifFalse: [Dictionary new]

?

My past observations have been that you like and appreciate the most-efficient performing code, so I'm truly curious!

Best Regards.
  Chris



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Levente Uzonyi
In reply to this post by Chris Muller-4
Hi Chris,

On Fri, 24 Jan 2020, Chris Muller wrote:

> Hi Levente,
>
>       >       > That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>       >       >
>       >       >     {
>       >       >    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>       >       >
>       >       >     "performance pain?"
>       >       >     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>       >       >     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>       >       >     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
>       >
>       >       Even if there's a performance overhead, you use less memory.
>       >
>       >
>       > But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing
>       less-performant
>       > code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.
>
>       It was okay for quite a long time. :)
>
>
> ... until this proposal which changed the smallest internal array size from 5 to 3 and introduced the above anomaly.  I assume you made that change because you felt something else wasn't okay (to which I agree!).
Capacity of 3 has been available through #compact, but #sizeFor: still
doesn't return it due to a bug. ReleaseBuilder compacts all
HashedCollections, so there are probably many HashedCollections in your
image with capacity 3 right now.

>  
>
>       >  
>       >        >     "into #sizeFor:"
>       >
>       >       >     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'
>       >
>       >       Starting from 4, you also save time by avoiding growing, which is
>       >       more significant than what you "lose" during instance creation.
>       >
>       >
>       > Except my Dictionary is never going to grow.
>
>       So, we actually say the same thing.
>
>
> Did we?  You were arguing about saving time with growing, which I can never gain.  I only lose during instance creation...
We did. You just quoted me: "save time by _avoiding_ growing".

>  
>       > In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them
>       all, and will
>       > never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.
>       >
>       >       We could get rid of the anomaly by changing #new to ^self new: 3.
>       >
>       >
>       > Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.
>
>       I don't see any broken contract here. It may be surprising to see
>       somewhat worse performance with #new: than with #new (25 nanoseconds per
>       instance according to your measurements),
>
>
> a 40% hit...
Only if you keep that Dictionary empty. And if you decide to do so, then
why create it at all? :)

>  
>       but both of those methods do
>       what they should. Just because there was an easy optimization applied to
>       #new, nothing was broken IMHO.
>
>
> ... for using #new: that's supposed to be for optimization!
>
> So if you were in my shoes (which, if this goes to trunk as-is eventually you will be!), would you take a 40% hit in performance-critical code needing to instantiate a Dictionary with:
>
>     Dictionary new: runtimeSize
>
> Or, actually violate encapsulation to preserve performance?
>
>    sz>3 ifTrue: [Dictionary new: sz] ifFalse: [Dictionary new]
I don't see the broken encapsulation there. That's just black box
optimization based on measurements as long as you don't look under the
hood, isn't it?

>
> ?
>
> My past observations have been that you like and appreciate the most-efficient performing code, so I'm truly curious!

I like optimizations where the cost can be considered smaller than the
benefits.
For example, in TextDiffBuilder, I even sacrificed legibility because the
extra complexity is local to a single method: #lcsFor:and:, and the
benefits are measurable. The previous implementations are available in
the Versions Browser, which can help understanding how and why the code
evolved into its current form.
There are plenty of cases in the Collections hierarchy where methods could
be overridden in various subclasses to gain a few percents, but the
changes would be extensive and the code would become very hard to
maintain.
Of course, the best optimizations are where there's no cost at all, like
those in Collections-ul.869 in the Inbox.


Levente

>
> Best Regards.
>   Chris
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.871.mcz

Chris Muller-3
>       >       > That's better, but it still has that same fundamental problem.  Every time a developer makes a HashedCollection of a known-at-runtime size (e.g., in a variable), they're forced to choose between execution performance pain or code pain.
>       >       >
>       >       >     {
>       >       >    '[ Dictionary new ]'->'100% of baseline rate, 27,600,000 per second. 36.2 nanoseconds per run. 11.33547 % GC time.'
>       >       >
>       >       >     "performance pain?"
>       >       >     '[ Dictionary new: 1 ]'->'60% of baseline rate, 16,600,000 per second. 60.1 nanoseconds per run. 5.61888 % GC time.'
>       >       >     '[ Dictionary new: 2 ]'->'61% of baseline rate, 16,900,000 per second. 59.2 nanoseconds per run. 5.67886 % GC time.'
>       >       >     '[ Dictionary new: 3 ]'->'59% of baseline rate, 16,300,000 per second. 61.5 nanoseconds per run. 6.77864 % GC time.'
>       >
>       >       Even if there's a performance overhead, you use less memory.
>       >
>       >
>       > But #new: is about optimization along *both* of those dimensions.  Imagine how complicated a "manpage" for #new: would have to be if it weren't.  #new: must _never_ perform significantly worse than #new (for sizes <= the default), because it would either trick or force developers into writing
>       less-performant
>       > code, or into acknowledging Squeak's internal Dictionary implementation in their own code.  It feels like an API-design bug.
>
>       It was okay for quite a long time. :)
>
>
> ... until this proposal which changed the smallest internal array size from 5 to 3 and introduced the above anomaly.  I assume you made that change because you felt something else wasn't okay (to which I agree!).

Capacity of 3 has been available through #compact, but #sizeFor: still
doesn't return it due to a bug.

That bug has been saving us from the anomaly now exposed by your improvements.
 
>       >        >     "into #sizeFor:"
>       >
>       >       >     '[ Dictionary new: 4 ]'->'57% of baseline rate, 15,800,000 per second. 63.5 nanoseconds per run. 7.87685 % GC time.'
>       >
>       >       Starting from 4, you also save time by avoiding growing, which is
>       >       more significant than what you "lose" during instance creation.
>       >
>       >
>       > Except my Dictionary is never going to grow.
>
>       So, we actually say the same thing.
>
>
> Did we?  You were arguing about saving time with growing, which I can never gain.  I only lose during instance creation...

We did. You just quoted me: "save time by _avoiding_ growing".

Oh, okay, but that's not relevant.  Our whole discussion is about fixing when the Request needs to make Dictionary's with 1 or 2 elements (in Collectoins-ul.871) and 3 (in trunk), not 4.
 
 >       > In case it helps bring clarity, my scenario is the GraphQL server.  As a Request comes in, the server will know, depending on the type, how many named arguments to expect (for most "normal" schema's, 0 to 4, but it can define any number it wants).  So it creates right sized Dictionary to hold them
>       all, and will
>       > never grow beyond that.  I simply don't want the server to have to do extra work when **vast majority** of requests will have fewer than 4 arguments.
>       >
>       >       We could get rid of the anomaly by changing #new to ^self new: 3.
>       >
>       >
>       > Yes, I'd be fine with that solution, too!   For me, it's only about violation of #new:'s contract.
>
>       I don't see any broken contract here. It may be surprising to see
>       somewhat worse performance with #new: than with #new (25 nanoseconds per
>       instance according to your measurements),
>
>
> a 40% hit...

Only if you keep that Dictionary empty.

No, I'm storing one element in (Dictionary new: 1) and two in (Dictionary new: 2).  But those are runtime sizes, those Dictionary's are created from a variable in the code.  I simply write #new: in the code because I can know the size up front, so its (supposed to be!) faster to pre-allocate to the correct size via #new:.  But for the millions with 1 or 2 arguments, it'd have been better to ignore #new: "optimization" there, for simply #new.  That's crazy.

I'd like to see a manpage-quality comment explaining this in HashedCollection class>>#new:.  Either that, or we need your fixed version of my fix, please.  :)
 
And if you decide to do so, then
why create it at all? :)

Right, I don't.  0 arguments creates no Dictionary.
 

>  
>       but both of those methods do
>       what they should. Just because there was an easy optimization applied to
>       #new, nothing was broken IMHO.
>
>
> ... for using #new: that's supposed to be for optimization!

       ^^^ this is kind of the key point, what about this?
 
>
> So if you were in my shoes (which, if this goes to trunk as-is eventually you will be!), would you take a 40% hit in performance-critical code needing to instantiate a Dictionary with:
>
>     Dictionary new: runtimeSize
>
> Or, actually violate encapsulation to preserve performance?
>
>    sz>3 ifTrue: [Dictionary new: sz] ifFalse: [Dictionary new]

I don't see the broken encapsulation there.

Here:

       -------> sz>3 <---- ifTrue: [Dictionary new: sz] ifFalse: [Dictionary new]

you would introduce a dependency on the private, internal initial size employed by Dictionary (e.g., breaking its encapsulation).  I want to keep the GraphQL framework as dialect-independent as possible, I can't bring myself to write that, and I hope you wouldn't either.
 
That's just black box
optimization based on measurements as long as you don't look under the
hood, isn't it?

Only for as long as what's under Dictionary's hood doesn't change and quietly break you.  It could end up just a bit slower enough to cost you, but reamain under your radar.  The worse code that was once as an optimization will have become insidious.

> ?
>
> My past observations have been that you like and appreciate the most-efficient performing code, so I'm truly curious!

I like optimizations where the cost can be considered smaller than the
benefits.
For example, in TextDiffBuilder, I even sacrificed legibility because the
extra complexity is local to a single method: #lcsFor:and:, and the
benefits are measurable. The previous implementations are available in
the Versions Browser, which can help understanding how and why the code
evolved into its current form.
There are plenty of cases in the Collections hierarchy where methods could
be overridden in various subclasses to gain a few percents, but the
changes would be extensive and the code would become very hard to
maintain.

The issue is about the quality of Squeak's API as it relates to user-expectations (#new vs #new:), not gaining a few percents.

Best,
  Chris